
Features: - JWT authentication with user registration and login - Video upload to Amazon S3 with file validation (200MB limit) - Audio transcription using OpenAI Whisper API - Text translation using GPT-4 API - Voice cloning and audio synthesis using ElevenLabs API - Video processing with ffmpeg for audio replacement - Complete SQLite database with proper models and migrations - Background task processing for long-running operations - Health endpoint and comprehensive API documentation Tech stack: - FastAPI with SQLAlchemy ORM - SQLite database with Alembic migrations - Amazon S3 for file storage - OpenAI APIs for transcription and translation - ElevenLabs API for voice cloning - ffmpeg for video processing - JWT authentication with bcrypt password hashing
5.4 KiB
AI Video Dubbing API
A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.
Features
🔐 Authentication: JWT-based user registration and login 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg
Tech Stack
- FastAPI - Modern, fast web framework
- SQLite - Database with SQLAlchemy ORM
- Amazon S3 - File storage
- OpenAI Whisper - Audio transcription
- GPT-4 - Text translation
- ElevenLabs - Voice cloning and synthesis
- ffmpeg - Video/audio processing
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Set Environment Variables
Create a .env
file in the root directory with the following variables:
# Authentication
SECRET_KEY=your-secret-key-change-this-in-production
# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket-name
# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
# ElevenLabs Configuration
ELEVENLABS_API_KEY=your-elevenlabs-api-key
3. Run Database Migrations
The database will be automatically created when you start the application. The SQLite database will be stored at /app/storage/db/db.sqlite
.
4. Start the Application
python main.py
Or with uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
The API will be available at:
- API: http://localhost:8000
- Documentation: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
API Endpoints
Authentication
POST /auth/register
- User registrationPOST /auth/login
- User login
Video Management
POST /videos/upload
- Upload video with language settingsGET /videos/
- Get user's videosGET /videos/{video_id}
- Get specific video details
Processing Pipeline
POST /transcription/{video_id}
- Start audio transcriptionGET /transcription/{video_id}
- Get transcription resultsPOST /translation/{video_id}
- Start text translationGET /translation/{video_id}
- Get translation resultsPOST /voice/clone/{video_id}
- Start voice cloning and audio generationGET /voice/{video_id}
- Get dubbed audio resultsPOST /process/{video_id}
- Start final video processingGET /process/{video_id}
- Get processed video results
Results
GET /process/results/{video_id}
- Get complete processing results
Workflow
- Register/Login to get JWT token
- Upload Video with source and target languages
- Transcribe the audio from the video
- Translate the transcribed text
- Clone Voice and generate dubbed audio
- Process Video to replace original audio with dubbed audio
- Download the final dubbed video
Environment Variables Reference
Variable | Description | Required |
---|---|---|
SECRET_KEY |
JWT secret key for authentication | Yes |
AWS_ACCESS_KEY_ID |
AWS access key for S3 | Yes |
AWS_SECRET_ACCESS_KEY |
AWS secret key for S3 | Yes |
AWS_REGION |
AWS region (default: us-east-1) | No |
S3_BUCKET_NAME |
S3 bucket name for file storage | Yes |
OPENAI_API_KEY |
OpenAI API key for Whisper and GPT-4 | Yes |
ELEVENLABS_API_KEY |
ElevenLabs API key for voice cloning | Yes |
File Storage Structure
Files are stored in S3 with the following structure:
/videos/{uuid}.mp4 - Original uploaded videos
/dubbed_audio/{uuid}.mp3 - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos
Database Schema
- users: User accounts with email/password
- videos: Video metadata and processing status
- transcriptions: Audio transcriptions
- translations: Translated text
- dubbed_audios: Generated audio files
- dubbed_videos: Final processed videos
Status Tracking
Videos have the following status values:
uploaded
- Video uploaded successfullytranscribing
- Audio transcription in progresstranscribed
- Transcription completedtranslating
- Text translation in progresstranslated
- Translation completedvoice_cloning
- Voice cloning and audio generation in progressvoice_cloned
- Dubbed audio generatedprocessing_video
- Final video processing in progresscompleted
- All processing completed*_failed
- Various failure states
Development
Code Linting
ruff check . --fix
Project Structure
├── main.py # FastAPI application entry point
├── requirements.txt # Python dependencies
├── alembic.ini # Database migration configuration
├── app/
│ ├── db/ # Database configuration
│ ├── models/ # SQLAlchemy models
│ ├── routes/ # API endpoints
│ ├── services/ # Business logic and external API integrations
│ └── utils/ # Utility functions (auth, etc.)
└── alembic/
└── versions/ # Database migration files