AI Video Dubbing API
A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.
Features
🔐 Authentication: Google OAuth integration for secure login 👤 User Profiles: Complete profile management with settings 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🔍 Auto Language Detection: Automatic detection of spoken language using Whisper 📝 Editable Transcripts: View and edit transcriptions before translation 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg 🐳 Docker Support: Full containerization with Docker and Docker Compose
Tech Stack
- FastAPI - Modern, fast web framework
- SQLite - Database with SQLAlchemy ORM
- Amazon S3 - File storage
- OpenAI Whisper - Audio transcription
- GPT-4 - Text translation
- ElevenLabs - Voice cloning and synthesis
- ffmpeg - Video/audio processing
Quick Start
Option 1: Docker (Recommended)
-
Copy environment file:
cp .env.example .env
-
Configure environment variables in
.env
:- Add your OpenAI API key
- Configure AWS S3 credentials
- Set up Google OAuth credentials
-
Run with Docker Compose:
docker-compose up -d
The API will be available at:
- API: http://localhost:8000
- Documentation: http://localhost:8000/docs
- Health Check: http://localhost:8000/health
Option 2: Local Development
-
Install Dependencies:
pip install -r requirements.txt
-
Configure Environment:
cp .env.example .env # Edit .env with your configuration
-
Start the Application:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
API Endpoints
Authentication (Google OAuth Only)
GET /auth/google/oauth-url
- Get Google OAuth URL for frontendPOST /auth/google/login-with-token
- Login/signup with Google ID tokenPOST /auth/google/login-with-code
- Login/signup with Google authorization code
Profile Management
GET /profile/
- Get user profilePUT /profile/
- Update profile informationPUT /profile/password
- Update passwordPUT /profile/email
- Update email addressDELETE /profile/
- Delete user account
Video Management & Language Detection
POST /videos/upload
- Upload video with auto language detectionGET /videos/
- Get user's videosGET /videos/{video_id}
- Get specific video detailsGET /videos/{video_id}/language
- Get detected video language
Transcription & Editable Transcripts
POST /transcription/{video_id}
- Start audio transcriptionGET /transcription/{video_id}
- Get transcription resultsGET /transcription/{video_id}/editable
- Get editable transcriptPUT /transcription/{video_id}/editable
- Update edited transcript
Translation Pipeline (Uses Edited Transcripts)
POST /translation/{video_id}
- Start text translation (uses edited transcript if available)GET /translation/{video_id}
- Get translation results
Voice Cloning & Video Processing
POST /voice/clone/{video_id}
- Start voice cloning and audio generationGET /voice/{video_id}
- Get dubbed audio resultsPOST /process/{video_id}
- Start final video processingGET /process/{video_id}
- Get processed video resultsGET /process/results/{video_id}
- Get complete processing results
Google OAuth Setup
1. Create Google OAuth Application
- Go to Google Cloud Console
- Create a new project or select existing one
- Enable the Google+ API
- Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs"
- Choose "Web application"
- Add authorized redirect URIs:
http://localhost:3000/auth/google/callback
(for development)- Your production callback URL
2. Configure Environment Variables
Add these to your .env
file:
GOOGLE_CLIENT_ID=your-google-oauth-client-id
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback
3. Frontend Integration
Option 1: Direct Token Method
// Use Google's JavaScript library to get ID token
const response = await fetch('/auth/google/login-with-token', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ id_token: googleIdToken })
});
Option 2: Authorization Code Method
// Redirect user to Google OAuth URL, then exchange code
const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json());
// Redirect to oauthUrl.oauth_url
// On callback, exchange code:
const response = await fetch('/auth/google/login-with-code', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
code: authorizationCode,
redirect_uri: 'http://localhost:3000/auth/google/callback'
})
});
Docker Setup
Building and Running
# Build and start the application
docker-compose up -d
# View logs
docker-compose logs -f api
# Stop the application
docker-compose down
# Rebuild after code changes
docker-compose up --build -d
Environment Variables
The application requires the following environment variables (copy from .env.example
):
OPENAI_API_KEY
- Required for transcription and translationAWS_ACCESS_KEY_ID
,AWS_SECRET_ACCESS_KEY
,S3_BUCKET_NAME
- Required for video storageGOOGLE_CLIENT_ID
,GOOGLE_CLIENT_SECRET
- Required for authentication- Other optional configuration variables
Storage
The Docker setup includes a persistent volume for:
- SQLite database (
/app/storage/db/
) - Local file storage (
/app/storage/
)
Workflow
- Login with Google OAuth to get authentication token
- Upload Video - Automatic language detection occurs during upload
- Transcribe the audio from the video
- Edit Transcript (optional) - Review and correct the transcription
- Translate the edited/original transcript
- Clone Voice and generate dubbed audio
- Process Video to replace original audio with dubbed audio
- Download the final dubbed video
Environment Variables Reference
Variable | Description | Required |
---|---|---|
OPENAI_API_KEY |
OpenAI API key for Whisper and GPT-4 | Yes |
AWS_ACCESS_KEY_ID |
AWS access key for S3 | Yes |
AWS_SECRET_ACCESS_KEY |
AWS secret key for S3 | Yes |
AWS_REGION |
AWS region (default: us-east-1) | No |
S3_BUCKET_NAME |
S3 bucket name for file storage | Yes |
GOOGLE_CLIENT_ID |
Google OAuth client ID | Yes |
GOOGLE_CLIENT_SECRET |
Google OAuth client secret | Yes |
GOOGLE_REDIRECT_URI |
Google OAuth redirect URI | Yes |
ELEVENLABS_API_KEY |
ElevenLabs API key for voice cloning | Yes |
DEBUG |
Enable debug mode (default: false) | No |
LOG_LEVEL |
Logging level (default: info) | No |
File Storage Structure
Files are stored in S3 with the following structure:
/videos/{uuid}.mp4 - Original uploaded videos
/dubbed_audio/{uuid}.mp3 - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos
Database Schema
- users: User accounts with email/password
- videos: Video metadata and processing status
- transcriptions: Audio transcriptions
- translations: Translated text
- dubbed_audios: Generated audio files
- dubbed_videos: Final processed videos
Status Tracking
Videos have the following status values:
uploaded
- Video uploaded successfullytranscribing
- Audio transcription in progresstranscribed
- Transcription completedtranslating
- Text translation in progresstranslated
- Translation completedvoice_cloning
- Voice cloning and audio generation in progressvoice_cloned
- Dubbed audio generatedprocessing_video
- Final video processing in progresscompleted
- All processing completed*_failed
- Various failure states
Development
Code Linting
ruff check . --fix
Project Structure
├── main.py # FastAPI application entry point
├── requirements.txt # Python dependencies
├── alembic.ini # Database migration configuration
├── app/
│ ├── db/ # Database configuration
│ ├── models/ # SQLAlchemy models
│ ├── routes/ # API endpoints
│ ├── services/ # Business logic and external API integrations
│ └── utils/ # Utility functions (auth, etc.)
└── alembic/
└── versions/ # Database migration files