Automated Action 55f5e5f5a8 Add comprehensive AI video dubbing features with Docker support
- Auto language detection using OpenAI Whisper during video upload
- Editable transcript interface for reviewing/correcting transcriptions
- Updated translation pipeline to use edited transcripts when available
- Migrated from JWT to Google OAuth-only authentication for better security
- Added complete Docker containerization with docker-compose.yml
- Updated database schema with language detection and transcript editing fields
- Enhanced API documentation and workflow in README
- Added comprehensive environment variable configuration

🤖 Generated with BackendIM

Co-Authored-By: Claude <noreply@anthropic.com>
2025-06-28 08:12:00 +00:00

270 lines
8.9 KiB
Markdown

# AI Video Dubbing API
A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.
## Features
🔐 **Authentication**: Google OAuth integration for secure login
👤 **User Profiles**: Complete profile management with settings
📁 **Video Upload**: Upload MP4/MOV files to Amazon S3 (max 200MB)
🔍 **Auto Language Detection**: Automatic detection of spoken language using Whisper
📝 **Editable Transcripts**: View and edit transcriptions before translation
🧠 **Transcription**: Audio transcription using OpenAI Whisper API
🌍 **Translation**: Text translation using GPT-4 API
🗣️ **Voice Cloning**: Voice synthesis using ElevenLabs API
🎥 **Video Processing**: Audio replacement and video processing with ffmpeg
🐳 **Docker Support**: Full containerization with Docker and Docker Compose
## Tech Stack
- **FastAPI** - Modern, fast web framework
- **SQLite** - Database with SQLAlchemy ORM
- **Amazon S3** - File storage
- **OpenAI Whisper** - Audio transcription
- **GPT-4** - Text translation
- **ElevenLabs** - Voice cloning and synthesis
- **ffmpeg** - Video/audio processing
## Quick Start
### Option 1: Docker (Recommended)
1. **Copy environment file**:
```bash
cp .env.example .env
```
2. **Configure environment variables** in `.env`:
- Add your OpenAI API key
- Configure AWS S3 credentials
- Set up Google OAuth credentials
3. **Run with Docker Compose**:
```bash
docker-compose up -d
```
The API will be available at:
- **API**: http://localhost:8000
- **Documentation**: http://localhost:8000/docs
- **Health Check**: http://localhost:8000/health
### Option 2: Local Development
1. **Install Dependencies**:
```bash
pip install -r requirements.txt
```
2. **Configure Environment**:
```bash
cp .env.example .env
# Edit .env with your configuration
```
3. **Start the Application**:
```bash
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
```
## API Endpoints
### Authentication (Google OAuth Only)
- `GET /auth/google/oauth-url` - Get Google OAuth URL for frontend
- `POST /auth/google/login-with-token` - Login/signup with Google ID token
- `POST /auth/google/login-with-code` - Login/signup with Google authorization code
### Profile Management
- `GET /profile/` - Get user profile
- `PUT /profile/` - Update profile information
- `PUT /profile/password` - Update password
- `PUT /profile/email` - Update email address
- `DELETE /profile/` - Delete user account
### Video Management & Language Detection
- `POST /videos/upload` - Upload video with auto language detection
- `GET /videos/` - Get user's videos
- `GET /videos/{video_id}` - Get specific video details
- `GET /videos/{video_id}/language` - Get detected video language
### Transcription & Editable Transcripts
- `POST /transcription/{video_id}` - Start audio transcription
- `GET /transcription/{video_id}` - Get transcription results
- `GET /transcription/{video_id}/editable` - Get editable transcript
- `PUT /transcription/{video_id}/editable` - Update edited transcript
### Translation Pipeline (Uses Edited Transcripts)
- `POST /translation/{video_id}` - Start text translation (uses edited transcript if available)
- `GET /translation/{video_id}` - Get translation results
### Voice Cloning & Video Processing
- `POST /voice/clone/{video_id}` - Start voice cloning and audio generation
- `GET /voice/{video_id}` - Get dubbed audio results
- `POST /process/{video_id}` - Start final video processing
- `GET /process/{video_id}` - Get processed video results
- `GET /process/results/{video_id}` - Get complete processing results
## Google OAuth Setup
### 1. Create Google OAuth Application
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
2. Create a new project or select existing one
3. Enable the Google+ API
4. Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs"
5. Choose "Web application"
6. Add authorized redirect URIs:
- `http://localhost:3000/auth/google/callback` (for development)
- Your production callback URL
### 2. Configure Environment Variables
Add these to your `.env` file:
```env
GOOGLE_CLIENT_ID=your-google-oauth-client-id
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback
```
### 3. Frontend Integration
**Option 1: Direct Token Method**
```javascript
// Use Google's JavaScript library to get ID token
const response = await fetch('/auth/google/login-with-token', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({ id_token: googleIdToken })
});
```
**Option 2: Authorization Code Method**
```javascript
// Redirect user to Google OAuth URL, then exchange code
const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json());
// Redirect to oauthUrl.oauth_url
// On callback, exchange code:
const response = await fetch('/auth/google/login-with-code', {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
code: authorizationCode,
redirect_uri: 'http://localhost:3000/auth/google/callback'
})
});
```
## Docker Setup
### Building and Running
```bash
# Build and start the application
docker-compose up -d
# View logs
docker-compose logs -f api
# Stop the application
docker-compose down
# Rebuild after code changes
docker-compose up --build -d
```
### Environment Variables
The application requires the following environment variables (copy from `.env.example`):
- `OPENAI_API_KEY` - Required for transcription and translation
- `AWS_ACCESS_KEY_ID`, `AWS_SECRET_ACCESS_KEY`, `S3_BUCKET_NAME` - Required for video storage
- `GOOGLE_CLIENT_ID`, `GOOGLE_CLIENT_SECRET` - Required for authentication
- Other optional configuration variables
### Storage
The Docker setup includes a persistent volume for:
- SQLite database (`/app/storage/db/`)
- Local file storage (`/app/storage/`)
## Workflow
1. **Login** with Google OAuth to get authentication token
2. **Upload Video** - Automatic language detection occurs during upload
3. **Transcribe** the audio from the video
4. **Edit Transcript** (optional) - Review and correct the transcription
5. **Translate** the edited/original transcript
6. **Clone Voice** and generate dubbed audio
7. **Process Video** to replace original audio with dubbed audio
8. **Download** the final dubbed video
## Environment Variables Reference
| Variable | Description | Required |
|----------|-------------|----------|
| `OPENAI_API_KEY` | OpenAI API key for Whisper and GPT-4 | Yes |
| `AWS_ACCESS_KEY_ID` | AWS access key for S3 | Yes |
| `AWS_SECRET_ACCESS_KEY` | AWS secret key for S3 | Yes |
| `AWS_REGION` | AWS region (default: us-east-1) | No |
| `S3_BUCKET_NAME` | S3 bucket name for file storage | Yes |
| `GOOGLE_CLIENT_ID` | Google OAuth client ID | Yes |
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | Yes |
| `GOOGLE_REDIRECT_URI` | Google OAuth redirect URI | Yes |
| `ELEVENLABS_API_KEY` | ElevenLabs API key for voice cloning | Yes |
| `DEBUG` | Enable debug mode (default: false) | No |
| `LOG_LEVEL` | Logging level (default: info) | No |
## File Storage Structure
Files are stored in S3 with the following structure:
```
/videos/{uuid}.mp4 - Original uploaded videos
/dubbed_audio/{uuid}.mp3 - Generated dubbed audio files
/processed_videos/{uuid}.mp4 - Final processed videos
```
## Database Schema
- **users**: User accounts with email/password
- **videos**: Video metadata and processing status
- **transcriptions**: Audio transcriptions
- **translations**: Translated text
- **dubbed_audios**: Generated audio files
- **dubbed_videos**: Final processed videos
## Status Tracking
Videos have the following status values:
- `uploaded` - Video uploaded successfully
- `transcribing` - Audio transcription in progress
- `transcribed` - Transcription completed
- `translating` - Text translation in progress
- `translated` - Translation completed
- `voice_cloning` - Voice cloning and audio generation in progress
- `voice_cloned` - Dubbed audio generated
- `processing_video` - Final video processing in progress
- `completed` - All processing completed
- `*_failed` - Various failure states
## Development
### Code Linting
```bash
ruff check . --fix
```
### Project Structure
```
├── main.py # FastAPI application entry point
├── requirements.txt # Python dependencies
├── alembic.ini # Database migration configuration
├── app/
│ ├── db/ # Database configuration
│ ├── models/ # SQLAlchemy models
│ ├── routes/ # API endpoints
│ ├── services/ # Business logic and external API integrations
│ └── utils/ # Utility functions (auth, etc.)
└── alembic/
└── versions/ # Database migration files
```