Automated Action 92e4d992b2 Implement complete AI video dubbing backend with FastAPI

Features:
- JWT authentication with user registration and login
- Video upload to Amazon S3 with file validation (200MB limit)
- Audio transcription using OpenAI Whisper API
- Text translation using GPT-4 API
- Voice cloning and audio synthesis using ElevenLabs API
- Video processing with ffmpeg for audio replacement
- Complete SQLite database with proper models and migrations
- Background task processing for long-running operations
- Health endpoint and comprehensive API documentation

Tech stack:
- FastAPI with SQLAlchemy ORM
- SQLite database with Alembic migrations
- Amazon S3 for file storage
- OpenAI APIs for transcription and translation
- ElevenLabs API for voice cloning
- ffmpeg for video processing
- JWT authentication with bcrypt password hashing

2025-06-24 17:56:12 +00:00

5.4 KiB

Raw Blame History

AI Video Dubbing API

A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.

Features

🔐 Authentication: JWT-based user registration and login 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg

Tech Stack

FastAPI - Modern, fast web framework
SQLite - Database with SQLAlchemy ORM
Amazon S3 - File storage
OpenAI Whisper - Audio transcription
GPT-4 - Text translation
ElevenLabs - Voice cloning and synthesis
ffmpeg - Video/audio processing

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set Environment Variables

Create a .env file in the root directory with the following variables:

# Authentication
SECRET_KEY=your-secret-key-change-this-in-production

# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket-name

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key

# ElevenLabs Configuration
ELEVENLABS_API_KEY=your-elevenlabs-api-key

3. Run Database Migrations

The database will be automatically created when you start the application. The SQLite database will be stored at /app/storage/db/db.sqlite.

4. Start the Application

python main.py

Or with uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

The API will be available at:

API: http://localhost:8000
Documentation: http://localhost:8000/docs
Alternative Docs: http://localhost:8000/redoc
Health Check: http://localhost:8000/health

API Endpoints

Authentication

POST /auth/register - User registration
POST /auth/login - User login

Video Management

POST /videos/upload - Upload video with language settings
GET /videos/ - Get user's videos
GET /videos/{video_id} - Get specific video details

Processing Pipeline

POST /transcription/{video_id} - Start audio transcription
GET /transcription/{video_id} - Get transcription results
POST /translation/{video_id} - Start text translation
GET /translation/{video_id} - Get translation results
POST /voice/clone/{video_id} - Start voice cloning and audio generation
GET /voice/{video_id} - Get dubbed audio results
POST /process/{video_id} - Start final video processing
GET /process/{video_id} - Get processed video results

Results

GET /process/results/{video_id} - Get complete processing results

Workflow

Register/Login to get JWT token
Upload Video with source and target languages
Transcribe the audio from the video
Translate the transcribed text
Clone Voice and generate dubbed audio
Process Video to replace original audio with dubbed audio
Download the final dubbed video

Environment Variables Reference

Variable	Description	Required
`SECRET_KEY`	JWT secret key for authentication	Yes
`AWS_ACCESS_KEY_ID`	AWS access key for S3	Yes
`AWS_SECRET_ACCESS_KEY`	AWS secret key for S3	Yes
`AWS_REGION`	AWS region (default: us-east-1)	No
`S3_BUCKET_NAME`	S3 bucket name for file storage	Yes
`OPENAI_API_KEY`	OpenAI API key for Whisper and GPT-4	Yes
`ELEVENLABS_API_KEY`	ElevenLabs API key for voice cloning	Yes

File Storage Structure

Files are stored in S3 with the following structure:

/videos/{uuid}.mp4        - Original uploaded videos
/dubbed_audio/{uuid}.mp3  - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos

Database Schema

users: User accounts with email/password
videos: Video metadata and processing status
transcriptions: Audio transcriptions
translations: Translated text
dubbed_audios: Generated audio files
dubbed_videos: Final processed videos

Status Tracking

Videos have the following status values:

uploaded - Video uploaded successfully
transcribing - Audio transcription in progress
transcribed - Transcription completed
translating - Text translation in progress
translated - Translation completed
voice_cloning - Voice cloning and audio generation in progress
voice_cloned - Dubbed audio generated
processing_video - Final video processing in progress
completed - All processing completed
*_failed - Various failure states

Development

Code Linting

ruff check . --fix

Project Structure

├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration configuration
├── app/
│   ├── db/                # Database configuration
│   ├── models/            # SQLAlchemy models
│   ├── routes/            # API endpoints
│   ├── services/          # Business logic and external API integrations
│   └── utils/             # Utility functions (auth, etc.)
└── alembic/
    └── versions/          # Database migration files

5.4 KiB Raw Blame History