Automated Action 025900c849 Fix database migration conflicts and improve migration safety
Changes:
- Enhanced migration 002 with column existence checks to prevent duplicate column errors
- Added comprehensive migration 003 that syncs database with current model state
- Modified application startup to avoid conflicts between create_all() and Alembic
- Added proper table/column existence checking in migrations
- Improved migration safety for production environments
- Removed automatic table creation when database already exists (relies on migrations)

This resolves the 'duplicate column name' error by ensuring migrations
check for existing columns before attempting to add them.
2025-06-24 19:46:05 +00:00
2025-06-24 17:45:42 +00:00

AI Video Dubbing API

A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.

Features

🔐 Authentication: JWT-based user registration and login 👤 User Profiles: Complete profile management with settings 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg

Tech Stack

  • FastAPI - Modern, fast web framework
  • SQLite - Database with SQLAlchemy ORM
  • Amazon S3 - File storage
  • OpenAI Whisper - Audio transcription
  • GPT-4 - Text translation
  • ElevenLabs - Voice cloning and synthesis
  • ffmpeg - Video/audio processing

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set Environment Variables

Create a .env file in the root directory with the following variables:

# Authentication
SECRET_KEY=your-secret-key-change-this-in-production

# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket-name

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key

# ElevenLabs Configuration
ELEVENLABS_API_KEY=your-elevenlabs-api-key

3. Run Database Migrations

The database will be automatically created when you start the application. The SQLite database will be stored at /app/storage/db/db.sqlite.

4. Start the Application

python main.py

Or with uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

The API will be available at:

API Endpoints

Authentication

  • POST /auth/register - User registration
  • POST /auth/login - User login

Profile Management

  • GET /profile/ - Get user profile
  • PUT /profile/ - Update profile information
  • PUT /profile/password - Update password
  • PUT /profile/email - Update email address
  • DELETE /profile/ - Delete user account

Video Management

  • POST /videos/upload - Upload video with language settings
  • GET /videos/ - Get user's videos
  • GET /videos/{video_id} - Get specific video details

Processing Pipeline

  • POST /transcription/{video_id} - Start audio transcription
  • GET /transcription/{video_id} - Get transcription results
  • POST /translation/{video_id} - Start text translation
  • GET /translation/{video_id} - Get translation results
  • POST /voice/clone/{video_id} - Start voice cloning and audio generation
  • GET /voice/{video_id} - Get dubbed audio results
  • POST /process/{video_id} - Start final video processing
  • GET /process/{video_id} - Get processed video results

Results

  • GET /process/results/{video_id} - Get complete processing results

Workflow

  1. Register/Login to get JWT token
  2. Upload Video with source and target languages
  3. Transcribe the audio from the video
  4. Translate the transcribed text
  5. Clone Voice and generate dubbed audio
  6. Process Video to replace original audio with dubbed audio
  7. Download the final dubbed video

Environment Variables Reference

Variable Description Required
SECRET_KEY JWT secret key for authentication Yes
AWS_ACCESS_KEY_ID AWS access key for S3 Yes
AWS_SECRET_ACCESS_KEY AWS secret key for S3 Yes
AWS_REGION AWS region (default: us-east-1) No
S3_BUCKET_NAME S3 bucket name for file storage Yes
OPENAI_API_KEY OpenAI API key for Whisper and GPT-4 Yes
ELEVENLABS_API_KEY ElevenLabs API key for voice cloning Yes

File Storage Structure

Files are stored in S3 with the following structure:

/videos/{uuid}.mp4        - Original uploaded videos
/dubbed_audio/{uuid}.mp3  - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos

Database Schema

  • users: User accounts with email/password
  • videos: Video metadata and processing status
  • transcriptions: Audio transcriptions
  • translations: Translated text
  • dubbed_audios: Generated audio files
  • dubbed_videos: Final processed videos

Status Tracking

Videos have the following status values:

  • uploaded - Video uploaded successfully
  • transcribing - Audio transcription in progress
  • transcribed - Transcription completed
  • translating - Text translation in progress
  • translated - Translation completed
  • voice_cloning - Voice cloning and audio generation in progress
  • voice_cloned - Dubbed audio generated
  • processing_video - Final video processing in progress
  • completed - All processing completed
  • *_failed - Various failure states

Development

Code Linting

ruff check . --fix

Project Structure

├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration configuration
├── app/
│   ├── db/                # Database configuration
│   ├── models/            # SQLAlchemy models
│   ├── routes/            # API endpoints
│   ├── services/          # Business logic and external API integrations
│   └── utils/             # Utility functions (auth, etc.)
└── alembic/
    └── versions/          # Database migration files
Description
Project: AI Video Dubbing API
Readme 118 KiB
Languages
Python 98.2%
Dockerfile 1.1%
Mako 0.7%