A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.

Features

🔐 Authentication: Google OAuth integration for secure login 👤 User Profiles: Complete profile management with settings 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🔍 Auto Language Detection: Automatic detection of spoken language using Whisper 📝 Editable Transcripts: View and edit transcriptions before translation 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg 🐳 Docker Support: Full containerization with Docker and Docker Compose

Tech Stack

FastAPI - Modern, fast web framework
SQLite - Database with SQLAlchemy ORM
Amazon S3 - File storage
OpenAI Whisper - Audio transcription
GPT-4 - Text translation
ElevenLabs - Voice cloning and synthesis
ffmpeg - Video/audio processing

Quick Start

Option 1: Docker (Recommended)

Copy environment file:
```
cp .env.example .env
```
Configure environment variables in .env:
- Add your OpenAI API key
- Configure AWS S3 credentials
- Set up Google OAuth credentials
Run with Docker Compose:
```
docker-compose up -d
```

The API will be available at:

API: http://localhost:8000
Documentation: http://localhost:8000/docs
Health Check: http://localhost:8000/health

Option 2: Local Development

Install Dependencies:
```
pip install -r requirements.txt
```

Configure Environment:

cp .env.example .env
# Edit .env with your configuration

Start the Application:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

API Endpoints

Authentication (Google OAuth Only)

GET /auth/google/oauth-url - Get Google OAuth URL for frontend
POST /auth/google/login-with-token - Login/signup with Google ID token
POST /auth/google/login-with-code - Login/signup with Google authorization code

Profile Management

GET /profile/ - Get user profile
PUT /profile/ - Update profile information
PUT /profile/password - Update password
PUT /profile/email - Update email address
DELETE /profile/ - Delete user account

Video Management & Language Detection

POST /videos/upload - Upload video with auto language detection
GET /videos/ - Get user's videos
GET /videos/{video_id} - Get specific video details
GET /videos/{video_id}/language - Get detected video language

Transcription & Editable Transcripts

POST /transcription/{video_id} - Start audio transcription
GET /transcription/{video_id} - Get transcription results
GET /transcription/{video_id}/editable - Get editable transcript
PUT /transcription/{video_id}/editable - Update edited transcript

Translation Pipeline (Uses Edited Transcripts)

POST /translation/{video_id} - Start text translation (uses edited transcript if available)
GET /translation/{video_id} - Get translation results

Voice Cloning & Video Processing

POST /voice/clone/{video_id} - Start voice cloning and audio generation
GET /voice/{video_id} - Get dubbed audio results
POST /process/{video_id} - Start final video processing
GET /process/{video_id} - Get processed video results
GET /process/results/{video_id} - Get complete processing results

Google OAuth Setup

1. Create Google OAuth Application

Go to Google Cloud Console
Create a new project or select existing one
Enable the Google+ API
Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs"
Choose "Web application"
Add authorized redirect URIs:
- http://localhost:3000/auth/google/callback (for development)
- Your production callback URL

2. Configure Environment Variables

Add these to your .env file:

GOOGLE_CLIENT_ID=your-google-oauth-client-id
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback

3. Frontend Integration

Option 1: Direct Token Method

// Use Google's JavaScript library to get ID token
const response = await fetch('/auth/google/login-with-token', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ id_token: googleIdToken })
});

Option 2: Authorization Code Method

// Redirect user to Google OAuth URL, then exchange code
const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json());
// Redirect to oauthUrl.oauth_url
// On callback, exchange code:
const response = await fetch('/auth/google/login-with-code', {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({ 
    code: authorizationCode,
    redirect_uri: 'http://localhost:3000/auth/google/callback'
  })
});

Docker Setup

Building and Running

# Build and start the application
docker-compose up -d

# View logs
docker-compose logs -f api

# Stop the application
docker-compose down

# Rebuild after code changes
docker-compose up --build -d

Environment Variables

The application requires the following environment variables (copy from .env.example):

OPENAI_API_KEY - Required for transcription and translation
AWS_ACCESS_KEY_ID, AWS_SECRET_ACCESS_KEY, S3_BUCKET_NAME - Required for video storage
GOOGLE_CLIENT_ID, GOOGLE_CLIENT_SECRET - Required for authentication
Other optional configuration variables

Storage

The Docker setup includes a persistent volume for:

SQLite database (/app/storage/db/)
Local file storage (/app/storage/)

Workflow

Login with Google OAuth to get authentication token
Upload Video - Automatic language detection occurs during upload
Transcribe the audio from the video
Edit Transcript (optional) - Review and correct the transcription
Translate the edited/original transcript
Clone Voice and generate dubbed audio
Process Video to replace original audio with dubbed audio
Download the final dubbed video

Environment Variables Reference

Variable	Description	Required
`OPENAI_API_KEY`	OpenAI API key for Whisper and GPT-4	Yes
`AWS_ACCESS_KEY_ID`	AWS access key for S3	Yes
`AWS_SECRET_ACCESS_KEY`	AWS secret key for S3	Yes
`AWS_REGION`	AWS region (default: us-east-1)	No
`S3_BUCKET_NAME`	S3 bucket name for file storage	Yes
`GOOGLE_CLIENT_ID`	Google OAuth client ID	Yes
`GOOGLE_CLIENT_SECRET`	Google OAuth client secret	Yes
`GOOGLE_REDIRECT_URI`	Google OAuth redirect URI	Yes
`ELEVENLABS_API_KEY`	ElevenLabs API key for voice cloning	Yes
`DEBUG`	Enable debug mode (default: false)	No
`LOG_LEVEL`	Logging level (default: info)	No

File Storage Structure

Files are stored in S3 with the following structure:

/videos/{uuid}.mp4        - Original uploaded videos
/dubbed_audio/{uuid}.mp3  - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos

Database Schema

users: User accounts with email/password
videos: Video metadata and processing status
transcriptions: Audio transcriptions
translations: Translated text
dubbed_audios: Generated audio files
dubbed_videos: Final processed videos

Status Tracking

Videos have the following status values:

uploaded - Video uploaded successfully
transcribing - Audio transcription in progress
transcribed - Transcription completed
translating - Text translation in progress
translated - Translation completed
voice_cloning - Voice cloning and audio generation in progress
voice_cloned - Dubbed audio generated
processing_video - Final video processing in progress
completed - All processing completed
*_failed - Various failure states

Development

Code Linting

ruff check . --fix

Project Structure

├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration configuration
├── app/
│   ├── db/                # Database configuration
│   ├── models/            # SQLAlchemy models
│   ├── routes/            # API endpoints
│   ├── services/          # Business logic and external API integrations
│   └── utils/             # Utility functions (auth, etc.)
└── alembic/
    └── versions/          # Database migration files