Automated Action 92e4d992b2 Implement complete AI video dubbing backend with FastAPI
Features:
- JWT authentication with user registration and login
- Video upload to Amazon S3 with file validation (200MB limit)
- Audio transcription using OpenAI Whisper API
- Text translation using GPT-4 API
- Voice cloning and audio synthesis using ElevenLabs API
- Video processing with ffmpeg for audio replacement
- Complete SQLite database with proper models and migrations
- Background task processing for long-running operations
- Health endpoint and comprehensive API documentation

Tech stack:
- FastAPI with SQLAlchemy ORM
- SQLite database with Alembic migrations
- Amazon S3 for file storage
- OpenAI APIs for transcription and translation
- ElevenLabs API for voice cloning
- ffmpeg for video processing
- JWT authentication with bcrypt password hashing
2025-06-24 17:56:12 +00:00
2025-06-24 17:45:42 +00:00

AI Video Dubbing API

A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.

Features

🔐 Authentication: JWT-based user registration and login 📁 Video Upload: Upload MP4/MOV files to Amazon S3 (max 200MB) 🧠 Transcription: Audio transcription using OpenAI Whisper API 🌍 Translation: Text translation using GPT-4 API 🗣️ Voice Cloning: Voice synthesis using ElevenLabs API 🎥 Video Processing: Audio replacement and video processing with ffmpeg

Tech Stack

  • FastAPI - Modern, fast web framework
  • SQLite - Database with SQLAlchemy ORM
  • Amazon S3 - File storage
  • OpenAI Whisper - Audio transcription
  • GPT-4 - Text translation
  • ElevenLabs - Voice cloning and synthesis
  • ffmpeg - Video/audio processing

Quick Start

1. Install Dependencies

pip install -r requirements.txt

2. Set Environment Variables

Create a .env file in the root directory with the following variables:

# Authentication
SECRET_KEY=your-secret-key-change-this-in-production

# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket-name

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key

# ElevenLabs Configuration
ELEVENLABS_API_KEY=your-elevenlabs-api-key

3. Run Database Migrations

The database will be automatically created when you start the application. The SQLite database will be stored at /app/storage/db/db.sqlite.

4. Start the Application

python main.py

Or with uvicorn:

uvicorn main:app --host 0.0.0.0 --port 8000 --reload

The API will be available at:

API Endpoints

Authentication

  • POST /auth/register - User registration
  • POST /auth/login - User login

Video Management

  • POST /videos/upload - Upload video with language settings
  • GET /videos/ - Get user's videos
  • GET /videos/{video_id} - Get specific video details

Processing Pipeline

  • POST /transcription/{video_id} - Start audio transcription
  • GET /transcription/{video_id} - Get transcription results
  • POST /translation/{video_id} - Start text translation
  • GET /translation/{video_id} - Get translation results
  • POST /voice/clone/{video_id} - Start voice cloning and audio generation
  • GET /voice/{video_id} - Get dubbed audio results
  • POST /process/{video_id} - Start final video processing
  • GET /process/{video_id} - Get processed video results

Results

  • GET /process/results/{video_id} - Get complete processing results

Workflow

  1. Register/Login to get JWT token
  2. Upload Video with source and target languages
  3. Transcribe the audio from the video
  4. Translate the transcribed text
  5. Clone Voice and generate dubbed audio
  6. Process Video to replace original audio with dubbed audio
  7. Download the final dubbed video

Environment Variables Reference

Variable Description Required
SECRET_KEY JWT secret key for authentication Yes
AWS_ACCESS_KEY_ID AWS access key for S3 Yes
AWS_SECRET_ACCESS_KEY AWS secret key for S3 Yes
AWS_REGION AWS region (default: us-east-1) No
S3_BUCKET_NAME S3 bucket name for file storage Yes
OPENAI_API_KEY OpenAI API key for Whisper and GPT-4 Yes
ELEVENLABS_API_KEY ElevenLabs API key for voice cloning Yes

File Storage Structure

Files are stored in S3 with the following structure:

/videos/{uuid}.mp4        - Original uploaded videos
/dubbed_audio/{uuid}.mp3  - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos

Database Schema

  • users: User accounts with email/password
  • videos: Video metadata and processing status
  • transcriptions: Audio transcriptions
  • translations: Translated text
  • dubbed_audios: Generated audio files
  • dubbed_videos: Final processed videos

Status Tracking

Videos have the following status values:

  • uploaded - Video uploaded successfully
  • transcribing - Audio transcription in progress
  • transcribed - Transcription completed
  • translating - Text translation in progress
  • translated - Translation completed
  • voice_cloning - Voice cloning and audio generation in progress
  • voice_cloned - Dubbed audio generated
  • processing_video - Final video processing in progress
  • completed - All processing completed
  • *_failed - Various failure states

Development

Code Linting

ruff check . --fix

Project Structure

├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration configuration
├── app/
│   ├── db/                # Database configuration
│   ├── models/            # SQLAlchemy models
│   ├── routes/            # API endpoints
│   ├── services/          # Business logic and external API integrations
│   └── utils/             # Utility functions (auth, etc.)
└── alembic/
    └── versions/          # Database migration files
Description
Project: AI Video Dubbing API
Readme 118 KiB
Languages
Python 98.2%
Dockerfile 1.1%
Mako 0.7%