
Changes: - Enhanced migration 002 with column existence checks to prevent duplicate column errors - Added comprehensive migration 003 that syncs database with current model state - Modified application startup to avoid conflicts between create_all() and Alembic - Added proper table/column existence checking in migrations - Improved migration safety for production environments - Removed automatic table creation when database already exists (relies on migrations) This resolves the 'duplicate column name' error by ensuring migrations check for existing columns before attempting to add them.
AI Video Dubbing API
A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.
Features
🔐 Authentication: JWT-based user registration and login 👤 User Profiles: Complete profile management with settings 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg
Tech Stack
- FastAPI - Modern, fast web framework
- SQLite - Database with SQLAlchemy ORM
- Amazon S3 - File storage
- OpenAI Whisper - Audio transcription
- GPT-4 - Text translation
- ElevenLabs - Voice cloning and synthesis
- ffmpeg - Video/audio processing
Quick Start
1. Install Dependencies
pip install -r requirements.txt
2. Set Environment Variables
Create a .env
file in the root directory with the following variables:
# Authentication
SECRET_KEY=your-secret-key-change-this-in-production
# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket-name
# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key
# ElevenLabs Configuration
ELEVENLABS_API_KEY=your-elevenlabs-api-key
3. Run Database Migrations
The database will be automatically created when you start the application. The SQLite database will be stored at /app/storage/db/db.sqlite
.
4. Start the Application
python main.py
Or with uvicorn:
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
The API will be available at:
- API: http://localhost:8000
- Documentation: http://localhost:8000/docs
- Alternative Docs: http://localhost:8000/redoc
- Health Check: http://localhost:8000/health
API Endpoints
Authentication
POST /auth/register
- User registrationPOST /auth/login
- User login
Profile Management
GET /profile/
- Get user profilePUT /profile/
- Update profile informationPUT /profile/password
- Update passwordPUT /profile/email
- Update email addressDELETE /profile/
- Delete user account
Video Management
POST /videos/upload
- Upload video with language settingsGET /videos/
- Get user's videosGET /videos/{video_id}
- Get specific video details
Processing Pipeline
POST /transcription/{video_id}
- Start audio transcriptionGET /transcription/{video_id}
- Get transcription resultsPOST /translation/{video_id}
- Start text translationGET /translation/{video_id}
- Get translation resultsPOST /voice/clone/{video_id}
- Start voice cloning and audio generationGET /voice/{video_id}
- Get dubbed audio resultsPOST /process/{video_id}
- Start final video processingGET /process/{video_id}
- Get processed video results
Results
GET /process/results/{video_id}
- Get complete processing results
Workflow
- Register/Login to get JWT token
- Upload Video with source and target languages
- Transcribe the audio from the video
- Translate the transcribed text
- Clone Voice and generate dubbed audio
- Process Video to replace original audio with dubbed audio
- Download the final dubbed video
Environment Variables Reference
Variable | Description | Required |
---|---|---|
SECRET_KEY |
JWT secret key for authentication | Yes |
AWS_ACCESS_KEY_ID |
AWS access key for S3 | Yes |
AWS_SECRET_ACCESS_KEY |
AWS secret key for S3 | Yes |
AWS_REGION |
AWS region (default: us-east-1) | No |
S3_BUCKET_NAME |
S3 bucket name for file storage | Yes |
OPENAI_API_KEY |
OpenAI API key for Whisper and GPT-4 | Yes |
ELEVENLABS_API_KEY |
ElevenLabs API key for voice cloning | Yes |
File Storage Structure
Files are stored in S3 with the following structure:
/videos/{uuid}.mp4 - Original uploaded videos
/dubbed_audio/{uuid}.mp3 - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos
Database Schema
- users: User accounts with email/password
- videos: Video metadata and processing status
- transcriptions: Audio transcriptions
- translations: Translated text
- dubbed_audios: Generated audio files
- dubbed_videos: Final processed videos
Status Tracking
Videos have the following status values:
uploaded
- Video uploaded successfullytranscribing
- Audio transcription in progresstranscribed
- Transcription completedtranslating
- Text translation in progresstranslated
- Translation completedvoice_cloning
- Voice cloning and audio generation in progressvoice_cloned
- Dubbed audio generatedprocessing_video
- Final video processing in progresscompleted
- All processing completed*_failed
- Various failure states
Development
Code Linting
ruff check . --fix
Project Structure
├── main.py # FastAPI application entry point
├── requirements.txt # Python dependencies
├── alembic.ini # Database migration configuration
├── app/
│ ├── db/ # Database configuration
│ ├── models/ # SQLAlchemy models
│ ├── routes/ # API endpoints
│ ├── services/ # Business logic and external API integrations
│ └── utils/ # Utility functions (auth, etc.)
└── alembic/
└── versions/ # Database migration files