# AI Video Dubbing API A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync. ## Features 🔐 **Authentication**: Google OAuth integration for secure login 👤 **User Profiles**: Complete profile management with settings 📁 **Video Upload**: Upload MP4/MOV files to Amazon S3 (max 200MB) 🔍 **Auto Language Detection**: Automatic detection of spoken language using Whisper 📝 **Editable Transcripts**: View and edit transcriptions before translation 🧠 **Transcription**: Audio transcription using OpenAI Whisper API 🌍 **Translation**: Text translation using GPT-4 API 🗣️ **Voice Cloning**: Voice synthesis using ElevenLabs API 🎥 **Video Processing**: Audio replacement and video processing with ffmpeg 🐳 **Docker Support**: Full containerization with Docker and Docker Compose ## Tech Stack - **FastAPI** - Modern, fast web framework - **SQLite** - Database with SQLAlchemy ORM - **Amazon S3** - File storage - **OpenAI Whisper** - Audio transcription - **GPT-4** - Text translation - **ElevenLabs** - Voice cloning and synthesis - **ffmpeg** - Video/audio processing ## Quick Start ### Option 1: Docker (Recommended) 1. **Copy environment file**: ```bash cp .env.example .env ``` 2. **Configure environment variables** in `.env`: - Add your OpenAI API key - Configure AWS S3 credentials - Set up Google OAuth credentials 3. **Run with Docker Compose**: ```bash docker-compose up -d ``` The API will be available at: - **API**: http://localhost:8000 - **Documentation**: http://localhost:8000/docs - **Health Check**: http://localhost:8000/health ### Option 2: Local Development 1. **Install Dependencies**: ```bash pip install -r requirements.txt ``` 2. **Configure Environment**: ```bash cp .env.example .env # Edit .env with your configuration ``` 3. **Start the Application**: ```bash uvicorn main:app --host 0.0.0.0 --port 8000 --reload ``` ## API Endpoints ### Authentication (Google OAuth Only) - `GET /auth/google/oauth-url` - Get Google OAuth URL for frontend - `POST /auth/google/login-with-token` - Login/signup with Google ID token - `POST /auth/google/login-with-code` - Login/signup with Google authorization code ### Profile Management - `GET /profile/` - Get user profile - `PUT /profile/` - Update profile information - `PUT /profile/password` - Update password - `PUT /profile/email` - Update email address - `DELETE /profile/` - Delete user account ### Video Management & Language Detection - `POST /videos/upload` - Upload video with auto language detection - `GET /videos/` - Get user's videos - `GET /videos/{video_id}` - Get specific video details - `GET /videos/{video_id}/language` - Get detected video language ### Transcription & Editable Transcripts - `POST /transcription/{video_id}` - Start audio transcription - `GET /transcription/{video_id}` - Get transcription results - `GET /transcription/{video_id}/editable` - Get editable transcript - `PUT /transcription/{video_id}/editable` - Update edited transcript ### Translation Pipeline (Uses Edited Transcripts) - `POST /translation/{video_id}` - Start text translation (uses edited transcript if available) - `GET /translation/{video_id}` - Get translation results ### Voice Cloning & Video Processing - `POST /voice/clone/{video_id}` - Start voice cloning and audio generation - `GET /voice/{video_id}` - Get dubbed audio results - `POST /process/{video_id}` - Start final video processing - `GET /process/{video_id}` - Get processed video results - `GET /process/results/{video_id}` - Get complete processing results ## Google OAuth Setup ### 1. Create Google OAuth Application 1. Go to [Google Cloud Console](https://console.cloud.google.com/) 2. Create a new project or select existing one 3. Enable the Google+ API 4. Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs" 5. Choose "Web application" 6. Add authorized redirect URIs: - `http://localhost:3000/auth/google/callback` (for development) - Your production callback URL ### 2. Configure Environment Variables Add these to your `.env` file: ```env GOOGLE_CLIENT_ID=your-google-oauth-client-id GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback ``` ### 3. Frontend Integration **Option 1: Direct Token Method** ```javascript // Use Google's JavaScript library to get ID token const response = await fetch('/auth/google/login-with-token', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ id_token: googleIdToken }) }); ``` **Option 2: Authorization Code Method** ```javascript // Redirect user to Google OAuth URL, then exchange code const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json()); // Redirect to oauthUrl.oauth_url // On callback, exchange code: const response = await fetch('/auth/google/login-with-code', { method: 'POST', headers: { 'Content-Type': 'application/json' }, body: JSON.stringify({ code: authorizationCode, redirect_uri: 'http://localhost:3000/auth/google/callback' }) }); ``` ## Docker Setup ### Building and Running ```bash # Build and start the application docker-compose up -d # View logs docker-compose logs -f api # Stop the application docker-compose down # Rebuild after code changes docker-compose up --build -d ``` ### Environment Variables The application requires the following environment variables (copy from `.env.example`): - `OPENAI_API_KEY` - Required for transcription and translation - `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `S3_BUCKET_NAME` - Required for video storage - `GOOGLE_CLIENT_ID`, `GOOGLE_CLIENT_SECRET` - Required for authentication - Other optional configuration variables ### Storage The Docker setup includes a persistent volume for: - SQLite database (`/app/storage/db/`) - Local file storage (`/app/storage/`) ## Workflow 1. **Login** with Google OAuth to get authentication token 2. **Upload Video** - Automatic language detection occurs during upload 3. **Transcribe** the audio from the video 4. **Edit Transcript** (optional) - Review and correct the transcription 5. **Translate** the edited/original transcript 6. **Clone Voice** and generate dubbed audio 7. **Process Video** to replace original audio with dubbed audio 8. **Download** the final dubbed video ## Environment Variables Reference | Variable | Description | Required | |----------|-------------|----------| | `OPENAI_API_KEY` | OpenAI API key for Whisper and GPT-4 | Yes | | `AWS_ACCESS_KEY_ID` | AWS access key for S3 | Yes | | `AWS_SECRET_ACCESS_KEY` | AWS secret key for S3 | Yes | | `AWS_REGION` | AWS region (default: us-east-1) | No | | `S3_BUCKET_NAME` | S3 bucket name for file storage | Yes | | `GOOGLE_CLIENT_ID` | Google OAuth client ID | Yes | | `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | Yes | | `GOOGLE_REDIRECT_URI` | Google OAuth redirect URI | Yes | | `ELEVENLABS_API_KEY` | ElevenLabs API key for voice cloning | Yes | | `DEBUG` | Enable debug mode (default: false) | No | | `LOG_LEVEL` | Logging level (default: info) | No | ## File Storage Structure Files are stored in S3 with the following structure: ``` /videos/{uuid}.mp4 - Original uploaded videos /dubbed_audio/{uuid}.mp3 - Generated dubbed audio files /processed_videos/{uuid}.mp4 - Final processed videos ``` ## Database Schema - **users**: User accounts with email/password - **videos**: Video metadata and processing status - **transcriptions**: Audio transcriptions - **translations**: Translated text - **dubbed_audios**: Generated audio files - **dubbed_videos**: Final processed videos ## Status Tracking Videos have the following status values: - `uploaded` - Video uploaded successfully - `transcribing` - Audio transcription in progress - `transcribed` - Transcription completed - `translating` - Text translation in progress - `translated` - Translation completed - `voice_cloning` - Voice cloning and audio generation in progress - `voice_cloned` - Dubbed audio generated - `processing_video` - Final video processing in progress - `completed` - All processing completed - `*_failed` - Various failure states ## Development ### Code Linting ```bash ruff check . --fix ``` ### Project Structure ``` ├── main.py # FastAPI application entry point ├── requirements.txt # Python dependencies ├── alembic.ini # Database migration configuration ├── app/ │ ├── db/ # Database configuration │ ├── models/ # SQLAlchemy models │ ├── routes/ # API endpoints │ ├── services/ # Business logic and external API integrations │ └── utils/ # Utility functions (auth, etc.) └── alembic/ └── versions/ # Database migration files ```