# AI Video Dubbing API A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync. ## Features 🔐 **Authentication**: JWT-based user registration and login 👤 **User Profiles**: Complete profile management with settings 📁 **Video Upload**: Upload MP4/MOV files to Amazon S3 (max 200MB) 🧠 **Transcription**: Audio transcription using OpenAI Whisper API 🌍 **Translation**: Text translation using GPT-4 API 🗣️ **Voice Cloning**: Voice synthesis using ElevenLabs API 🎥 **Video Processing**: Audio replacement and video processing with ffmpeg ## Tech Stack - **FastAPI** - Modern, fast web framework - **SQLite** - Database with SQLAlchemy ORM - **Amazon S3** - File storage - **OpenAI Whisper** - Audio transcription - **GPT-4** - Text translation - **ElevenLabs** - Voice cloning and synthesis - **ffmpeg** - Video/audio processing ## Quick Start ### 1. Install Dependencies ```bash pip install -r requirements.txt ``` ### 2. Set Environment Variables Create a `.env` file in the root directory with the following variables: ```env # Authentication SECRET_KEY=your-secret-key-change-this-in-production # Google OAuth Configuration GOOGLE_CLIENT_ID=your-google-client-id GOOGLE_CLIENT_SECRET=your-google-client-secret GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback # AWS S3 Configuration AWS_ACCESS_KEY_ID=your-aws-access-key AWS_SECRET_ACCESS_KEY=your-aws-secret-key AWS_REGION=us-east-1 S3_BUCKET_NAME=your-s3-bucket-name # OpenAI Configuration OPENAI_API_KEY=your-openai-api-key # ElevenLabs Configuration ELEVENLABS_API_KEY=your-elevenlabs-api-key ``` ### 3. Run Database Migrations The database will be automatically created when you start the application. The SQLite database will be stored at `/app/storage/db/db.sqlite`. ### 4. Start the Application ```bash python main.py ``` Or with uvicorn: ```bash uvicorn main:app --host 0.0.0.0 --port 8000 --reload ``` The API will be available at: - **API**: http://localhost:8000 - **Documentation**: http://localhost:8000/docs - **Alternative Docs**: http://localhost:8000/redoc - **Health Check**: http://localhost:8000/health ## API Endpoints ### Authentication - `POST /auth/register` - User registration with email/password - `POST /auth/login` - User login with email/password - `GET /auth/google/oauth-url` - Get Google OAuth URL for frontend - `POST /auth/google/login-with-token` - Login/signup with Google ID token - `POST /auth/google/login-with-code` - Login/signup with Google authorization code ### Profile Management - `GET /profile/` - Get user profile - `PUT /profile/` - Update profile information - `PUT /profile/password` - Update password - `PUT /profile/email` - Update email address - `DELETE /profile/` - Delete user account ### Video Management - `POST /videos/upload` - Upload video with language settings - `GET /videos/` - Get user's videos - `GET /videos/{video_id}` - Get specific video details ### Processing Pipeline - `POST /transcription/{video_id}` - Start audio transcription - `GET /transcription/{video_id}` - Get transcription results - `POST /translation/{video_id}` - Start text translation - `GET /translation/{video_id}` - Get translation results - `POST /voice/clone/{video_id}` - Start voice cloning and audio generation - `GET /voice/{video_id}` - Get dubbed audio results - `POST /process/{video_id}` - Start final video processing - `GET /process/{video_id}` - Get processed video results ### Results - `GET /process/results/{video_id}` - Get complete processing results ## Google OAuth Setup ### 1. Create Google OAuth Application 1. Go to [Google Cloud Console](https://console.cloud.google.com/) 2. Create a new project or select existing one 3. Enable the Google+ API 4. Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs" 5. Choose "Web application" 6. Add authorized redirect URIs: - `http://localhost:3000/auth/google/callback` (for development) - Your production callback URL ### 2. Configure Environment Variables Add these to your `.env` file: ```env GOOGLE_CLIENT_ID=your-google-oauth-client-id GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback ``` ### 3. Frontend Integration **Option 1: Direct Token Method** ```javascript // Use Google's JavaScript library to get ID token const response = await fetch('/auth/google/login-with-token', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ id_token: googleIdToken }) }); ``` **Option 2: Authorization Code Method** ```javascript // Redirect user to Google OAuth URL, then exchange code const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json()); // Redirect to oauthUrl.oauth_url // On callback, exchange code: const response = await fetch('/auth/google/login-with-code', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ code: authorizationCode, redirect_uri: 'http://localhost:3000/auth/google/callback' }) }); ``` ## Workflow 1. **Register/Login** (Email/Password or Google OAuth) to get JWT token 2. **Upload Video** with source and target languages 3. **Transcribe** the audio from the video 4. **Translate** the transcribed text 5. **Clone Voice** and generate dubbed audio 6. **Process Video** to replace original audio with dubbed audio 7. **Download** the final dubbed video ## Environment Variables Reference | Variable | Description | Required | |----------|-------------|----------| | `SECRET_KEY` | JWT secret key for authentication | Yes | | `GOOGLE_CLIENT_ID` | Google OAuth client ID | No* | | `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | No* | | `GOOGLE_REDIRECT_URI` | Google OAuth redirect URI | No* | | `AWS_ACCESS_KEY_ID` | AWS access key for S3 | Yes | | `AWS_SECRET_ACCESS_KEY` | AWS secret key for S3 | Yes | | `AWS_REGION` | AWS region (default: us-east-1) | No | | `S3_BUCKET_NAME` | S3 bucket name for file storage | Yes | | `OPENAI_API_KEY` | OpenAI API key for Whisper and GPT-4 | Yes | | `ELEVENLABS_API_KEY` | ElevenLabs API key for voice cloning | Yes | *Required only if Google OAuth is enabled ## File Storage Structure Files are stored in S3 with the following structure: ``` /videos/{uuid}.mp4 - Original uploaded videos /dubbed_audio/{uuid}.mp3 - Generated dubbed audio files /processed_videos/{uuid}.mp4 - Final processed videos ``` ## Database Schema - **users**: User accounts with email/password - **videos**: Video metadata and processing status - **transcriptions**: Audio transcriptions - **translations**: Translated text - **dubbed_audios**: Generated audio files - **dubbed_videos**: Final processed videos ## Status Tracking Videos have the following status values: - `uploaded` - Video uploaded successfully - `transcribing` - Audio transcription in progress - `transcribed` - Transcription completed - `translating` - Text translation in progress - `translated` - Translation completed - `voice_cloning` - Voice cloning and audio generation in progress - `voice_cloned` - Dubbed audio generated - `processing_video` - Final video processing in progress - `completed` - All processing completed - `*_failed` - Various failure states ## Development ### Code Linting ```bash ruff check . --fix ``` ### Project Structure ``` ├── main.py # FastAPI application entry point ├── requirements.txt # Python dependencies ├── alembic.ini # Database migration configuration ├── app/ │ ├── db/ # Database configuration │ ├── models/ # SQLAlchemy models │ ├── routes/ # API endpoints │ ├── services/ # Business logic and external API integrations │ └── utils/ # Utility functions (auth, etc.) └── alembic/ └── versions/ # Database migration files ```