aivideodubbingapi-r08gi1/README.md

# AI Video Dubbing API

A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.

## Features

🔐 **Authentication**: JWT-based user registration and login
📁 **Video Upload**: Upload MP4/MOV files to Amazon S3 (max 200MB)
🧠 **Transcription**: Audio transcription using OpenAI Whisper API
🌍 **Translation**: Text translation using GPT-4 API
🗣️ **Voice Cloning**: Voice synthesis using ElevenLabs API
🎥 **Video Processing**: Audio replacement and video processing with ffmpeg

## Tech Stack

- **FastAPI** - Modern, fast web framework
- **SQLite** - Database with SQLAlchemy ORM
- **Amazon S3** - File storage
- **OpenAI Whisper** - Audio transcription
- **GPT-4** - Text translation
- **ElevenLabs** - Voice cloning and synthesis
- **ffmpeg** - Video/audio processing

## Quick Start

### 1. Install Dependencies

```bash
pip install -r requirements.txt
```

### 2. Set Environment Variables

Create a `.env` file in the root directory with the following variables:

```env
# Authentication
SECRET_KEY=your-secret-key-change-this-in-production

# AWS S3 Configuration
AWS_ACCESS_KEY_ID=your-aws-access-key
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
AWS_REGION=us-east-1
S3_BUCKET_NAME=your-s3-bucket-name

# OpenAI Configuration
OPENAI_API_KEY=your-openai-api-key

# ElevenLabs Configuration
ELEVENLABS_API_KEY=your-elevenlabs-api-key
```

### 3. Run Database Migrations

The database will be automatically created when you start the application. The SQLite database will be stored at `/app/storage/db/db.sqlite`.

### 4. Start the Application

```bash
python main.py
```

Or with uvicorn:

```bash
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```

The API will be available at:
- **API**: http://localhost:8000
- **Documentation**: http://localhost:8000/docs
- **Alternative Docs**: http://localhost:8000/redoc
- **Health Check**: http://localhost:8000/health

## API Endpoints

### Authentication
- `POST /auth/register` - User registration
- `POST /auth/login` - User login

### Video Management
- `POST /videos/upload` - Upload video with language settings
- `GET /videos/` - Get user's videos
- `GET /videos/{video_id}` - Get specific video details

### Processing Pipeline
- `POST /transcription/{video_id}` - Start audio transcription
- `GET /transcription/{video_id}` - Get transcription results
- `POST /translation/{video_id}` - Start text translation
- `GET /translation/{video_id}` - Get translation results
- `POST /voice/clone/{video_id}` - Start voice cloning and audio generation
- `GET /voice/{video_id}` - Get dubbed audio results
- `POST /process/{video_id}` - Start final video processing
- `GET /process/{video_id}` - Get processed video results

### Results
- `GET /process/results/{video_id}` - Get complete processing results

## Workflow

1. **Register/Login** to get JWT token
2. **Upload Video** with source and target languages
3. **Transcribe** the audio from the video
4. **Translate** the transcribed text
5. **Clone Voice** and generate dubbed audio
6. **Process Video** to replace original audio with dubbed audio
7. **Download** the final dubbed video

## Environment Variables Reference

| Variable | Description | Required |
|----------|-------------|----------|
| `SECRET_KEY` | JWT secret key for authentication | Yes |
| `AWS_ACCESS_KEY_ID` | AWS access key for S3 | Yes |
| `AWS_SECRET_ACCESS_KEY` | AWS secret key for S3 | Yes |
| `AWS_REGION` | AWS region (default: us-east-1) | No |
| `S3_BUCKET_NAME` | S3 bucket name for file storage | Yes |
| `OPENAI_API_KEY` | OpenAI API key for Whisper and GPT-4 | Yes |
| `ELEVENLABS_API_KEY` | ElevenLabs API key for voice cloning | Yes |

## File Storage Structure

Files are stored in S3 with the following structure:
```
/videos/{uuid}.mp4        - Original uploaded videos
/dubbed_audio/{uuid}.mp3  - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos
```

## Database Schema

- **users**: User accounts with email/password
- **videos**: Video metadata and processing status
- **transcriptions**: Audio transcriptions
- **translations**: Translated text
- **dubbed_audios**: Generated audio files
- **dubbed_videos**: Final processed videos

## Status Tracking

Videos have the following status values:
- `uploaded` - Video uploaded successfully
- `transcribing` - Audio transcription in progress
- `transcribed` - Transcription completed
- `translating` - Text translation in progress
- `translated` - Translation completed
- `voice_cloning` - Voice cloning and audio generation in progress
- `voice_cloned` - Dubbed audio generated
- `processing_video` - Final video processing in progress
- `completed` - All processing completed
- `*_failed` - Various failure states

## Development

### Code Linting
```bash
ruff check . --fix
```

### Project Structure
```
├── main.py                 # FastAPI application entry point
├── requirements.txt        # Python dependencies
├── alembic.ini            # Database migration configuration
├── app/
│   ├── db/                # Database configuration
│   ├── models/            # SQLAlchemy models
│   ├── routes/            # API endpoints
│   ├── services/          # Business logic and external API integrations
│   └── utils/             # Utility functions (auth, etc.)
└── alembic/
    └── versions/          # Database migration files
```