AI Video Dubbing API

A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.

Features

🔐 Authentication: Google OAuth integration for secure login 👤 User Profiles: Complete profile management with settings 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🔍 Auto Language Detection: Automatic detection of spoken language using Whisper 📝 Editable Transcripts: View and edit transcriptions before translation 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg 🐳 Docker Support: Full containerization with Docker and Docker Compose

Tech Stack

  • FastAPI - Modern, fast web framework
  • SQLite - Database with SQLAlchemy ORM
  • Amazon S3 - File storage
  • OpenAI Whisper - Audio transcription
  • GPT-4 - Text translation
  • ElevenLabs - Voice cloning and synthesis
  • ffmpeg - Video/audio processing

Quick Start

  1. Copy environment file:

    cp .env.example .env
    
  2. Configure environment variables in .env:

    • Add your OpenAI API key
    • Configure AWS S3 credentials
    • Set up Google OAuth credentials
  3. Run with Docker Compose:

    docker-compose up -d
    

The API will be available at:

Option 2: Local Development

  1. Install Dependencies:

    pip install -r requirements.txt
    
  2. Configure Environment:

    cp .env.example .env
    # Edit .env with your configuration
    
  3. Start the Application:

    uvicorn main:app --host 0.0.0.0 --port 8000 --reload
    

API Endpoints

Authentication (Google OAuth Only)

  • GET /auth/google/oauth-url - Get Google OAuth URL for frontend
  • POST /auth/google/login-with-token - Login/signup with Google ID token
  • POST /auth/google/login-with-code - Login/signup with Google authorization code

Profile Management

  • GET /profile/ - Get user profile
  • PUT /profile/ - Update profile information
  • PUT /profile/password - Update password
  • PUT /profile/email - Update email address
  • DELETE /profile/ - Delete user account

Video Management & Language Detection

  • POST /videos/upload - Upload video with auto language detection
  • GET /videos/ - Get user's videos
  • GET /videos/{video_id} - Get specific video details
  • GET /videos/{video_id}/language - Get detected video language

Transcription & Editable Transcripts

  • POST /transcription/{video_id} - Start audio transcription
  • GET /transcription/{video_id} - Get transcription results
  • GET /transcription/{video_id}/editable - Get editable transcript
  • PUT /transcription/{video_id}/editable - Update edited transcript

Translation Pipeline (Uses Edited Transcripts)

  • POST /translation/{video_id} - Start text translation (uses edited transcript if available)
  • GET /translation/{video_id} - Get translation results

Voice Cloning & Video Processing

  • POST /voice/clone/{video_id} - Start voice cloning and audio generation
  • GET /voice/{video_id} - Get dubbed audio results
  • POST /process/{video_id} - Start final video processing
  • GET /process/{video_id} - Get processed video results
  • GET /process/results/{video_id} - Get complete processing results

Google OAuth Setup

1. Create Google OAuth Application

  1. Go to Google Cloud Console
  2. Create a new project or select existing one
  3. Enable the Google+ API
  4. Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs"
  5. Choose "Web application"
  6. Add authorized redirect URIs:
    • http://localhost:3000/auth/google/callback (for development)
    • Your production callback URL

2. Configure Environment Variables

Add these to your .env file:

GOOGLE_CLIENT_ID=your-google-oauth-client-id
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback

3. Frontend Integration

Option 1: Direct Token Method

// Use Google's JavaScript library to get ID token
const response = await fetch('/auth/google/login-with-token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ id_token: googleIdToken })
});

Option 2: Authorization Code Method

// Redirect user to Google OAuth URL, then exchange code
const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json());
// Redirect to oauthUrl.oauth_url
// On callback, exchange code:
const response = await fetch('/auth/google/login-with-code', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ 
    code: authorizationCode,
    redirect_uri: 'http://localhost:3000/auth/google/callback'
  })
});

Docker Setup

Building and Running

# Build and start the application
docker-compose up -d

# View logs
docker-compose logs -f api

# Stop the application
docker-compose down

# Rebuild after code changes
docker-compose up --build -d

Environment Variables

The application requires the following environment variables (copy from .env.example):

  • OPENAI_API_KEY - Required for transcription and translation
  • AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME - Required for video storage
  • GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET - Required for authentication
  • Other optional configuration variables

Storage

The Docker setup includes a persistent volume for:

  • SQLite database (/app/storage/db/)
  • Local file storage (/app/storage/)

Workflow

  1. Login with Google OAuth to get authentication token
  2. Upload Video - Automatic language detection occurs during upload
  3. Transcribe the audio from the video
  4. Edit Transcript (optional) - Review and correct the transcription
  5. Translate the edited/original transcript
  6. Clone Voice and generate dubbed audio
  7. Process Video to replace original audio with dubbed audio
  8. Download the final dubbed video

Environment Variables Reference

Variable Description Required
OPENAI_API_KEY OpenAI API key for Whisper and GPT-4 Yes
AWS_ACCESS_KEY_ID AWS access key for S3 Yes
AWS_SECRET_ACCESS_KEY AWS secret key for S3 Yes
AWS_REGION AWS region (default: us-east-1) No
S3_BUCKET_NAME S3 bucket name for file storage Yes
GOOGLE_CLIENT_ID Google OAuth client ID Yes
GOOGLE_CLIENT_SECRET Google OAuth client secret Yes
GOOGLE_REDIRECT_URI Google OAuth redirect URI Yes
ELEVENLABS_API_KEY ElevenLabs API key for voice cloning Yes
DEBUG Enable debug mode (default: false) No
LOG_LEVEL Logging level (default: info) No

File Storage Structure

Files are stored in S3 with the following structure:

/videos/{uuid}.mp4        - Original uploaded videos
/dubbed_audio/{uuid}.mp3  - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos

Database Schema

  • users: User accounts with email/password
  • videos: Video metadata and processing status
  • transcriptions: Audio transcriptions
  • translations: Translated text
  • dubbed_audios: Generated audio files
  • dubbed_videos: Final processed videos

Status Tracking

Videos have the following status values:

  • uploaded - Video uploaded successfully
  • transcribing - Audio transcription in progress
  • transcribed - Transcription completed
  • translating - Text translation in progress
  • translated - Translation completed
  • voice_cloning - Voice cloning and audio generation in progress
  • voice_cloned - Dubbed audio generated
  • processing_video - Final video processing in progress
  • completed - All processing completed
  • *_failed - Various failure states

Development

Code Linting

ruff check . --fix

Project Structure

├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration configuration
├── app/
│   ├── db/                # Database configuration
│   ├── models/            # SQLAlchemy models
│   ├── routes/            # API endpoints
│   ├── services/          # Business logic and external API integrations
│   └── utils/             # Utility functions (auth, etc.)
└── alembic/
    └── versions/          # Database migration files
Description
Project: AI Video Dubbing API
Readme 118 KiB
Languages
Python 98.2%
Dockerfile 1.1%
Mako 0.7%