
Features: - Complete Google OAuth 2.0 integration with ID token and authorization code flows - Enhanced User model with Google OAuth fields (google_id, is_google_user, email_verified, profile_picture) - Google OAuth service for token verification and user info extraction - Multiple authentication endpoints: - GET /auth/google/oauth-url (get OAuth URL for frontend) - POST /auth/google/login-with-token (direct ID token login) - POST /auth/google/login-with-code (authorization code exchange) - Smart user handling: creates new users or links existing accounts - Issues own JWT tokens after Google authentication - Database migration 004 for Google OAuth fields - Enhanced login logic to handle Google vs password users - Comprehensive README with Google OAuth setup instructions - Frontend integration examples for both OAuth flows Google OAuth automatically: - Creates user accounts on first login - Links existing email accounts to Google - Extracts profile information (name, picture, locale) - Verifies email addresses - Issues secure JWT tokens for API access
245 lines
7.9 KiB
Markdown
245 lines
7.9 KiB
Markdown
# AI Video Dubbing API
|
|
|
|
A FastAPI backend for an AI-powered video dubbing tool that allows content creators to upload short-form videos, transcribe audio, translate to different languages, clone voices, and generate dubbed videos with lip-sync.
|
|
|
|
## Features
|
|
|
|
🔐 **Authentication**: JWT-based user registration and login
|
|
👤 **User Profiles**: Complete profile management with settings
|
|
📁 **Video Upload**: Upload MP4/MOV files to Amazon S3 (max 200MB)
|
|
🧠 **Transcription**: Audio transcription using OpenAI Whisper API
|
|
🌍 **Translation**: Text translation using GPT-4 API
|
|
🗣️ **Voice Cloning**: Voice synthesis using ElevenLabs API
|
|
🎥 **Video Processing**: Audio replacement and video processing with ffmpeg
|
|
|
|
## Tech Stack
|
|
|
|
- **FastAPI** - Modern, fast web framework
|
|
- **SQLite** - Database with SQLAlchemy ORM
|
|
- **Amazon S3** - File storage
|
|
- **OpenAI Whisper** - Audio transcription
|
|
- **GPT-4** - Text translation
|
|
- **ElevenLabs** - Voice cloning and synthesis
|
|
- **ffmpeg** - Video/audio processing
|
|
|
|
## Quick Start
|
|
|
|
### 1. Install Dependencies
|
|
|
|
```bash
|
|
pip install -r requirements.txt
|
|
```
|
|
|
|
### 2. Set Environment Variables
|
|
|
|
Create a `.env` file in the root directory with the following variables:
|
|
|
|
```env
|
|
# Authentication
|
|
SECRET_KEY=your-secret-key-change-this-in-production
|
|
|
|
# Google OAuth Configuration
|
|
GOOGLE_CLIENT_ID=your-google-client-id
|
|
GOOGLE_CLIENT_SECRET=your-google-client-secret
|
|
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback
|
|
|
|
# AWS S3 Configuration
|
|
AWS_ACCESS_KEY_ID=your-aws-access-key
|
|
AWS_SECRET_ACCESS_KEY=your-aws-secret-key
|
|
AWS_REGION=us-east-1
|
|
S3_BUCKET_NAME=your-s3-bucket-name
|
|
|
|
# OpenAI Configuration
|
|
OPENAI_API_KEY=your-openai-api-key
|
|
|
|
# ElevenLabs Configuration
|
|
ELEVENLABS_API_KEY=your-elevenlabs-api-key
|
|
```
|
|
|
|
### 3. Run Database Migrations
|
|
|
|
The database will be automatically created when you start the application. The SQLite database will be stored at `/app/storage/db/db.sqlite`.
|
|
|
|
### 4. Start the Application
|
|
|
|
```bash
|
|
python main.py
|
|
```
|
|
|
|
Or with uvicorn:
|
|
|
|
```bash
|
|
uvicorn main:app --host 0.0.0.0 --port 8000 --reload
|
|
```
|
|
|
|
The API will be available at:
|
|
- **API**: http://localhost:8000
|
|
- **Documentation**: http://localhost:8000/docs
|
|
- **Alternative Docs**: http://localhost:8000/redoc
|
|
- **Health Check**: http://localhost:8000/health
|
|
|
|
## API Endpoints
|
|
|
|
### Authentication
|
|
- `POST /auth/register` - User registration with email/password
|
|
- `POST /auth/login` - User login with email/password
|
|
- `GET /auth/google/oauth-url` - Get Google OAuth URL for frontend
|
|
- `POST /auth/google/login-with-token` - Login/signup with Google ID token
|
|
- `POST /auth/google/login-with-code` - Login/signup with Google authorization code
|
|
|
|
### Profile Management
|
|
- `GET /profile/` - Get user profile
|
|
- `PUT /profile/` - Update profile information
|
|
- `PUT /profile/password` - Update password
|
|
- `PUT /profile/email` - Update email address
|
|
- `DELETE /profile/` - Delete user account
|
|
|
|
### Video Management
|
|
- `POST /videos/upload` - Upload video with language settings
|
|
- `GET /videos/` - Get user's videos
|
|
- `GET /videos/{video_id}` - Get specific video details
|
|
|
|
### Processing Pipeline
|
|
- `POST /transcription/{video_id}` - Start audio transcription
|
|
- `GET /transcription/{video_id}` - Get transcription results
|
|
- `POST /translation/{video_id}` - Start text translation
|
|
- `GET /translation/{video_id}` - Get translation results
|
|
- `POST /voice/clone/{video_id}` - Start voice cloning and audio generation
|
|
- `GET /voice/{video_id}` - Get dubbed audio results
|
|
- `POST /process/{video_id}` - Start final video processing
|
|
- `GET /process/{video_id}` - Get processed video results
|
|
|
|
### Results
|
|
- `GET /process/results/{video_id}` - Get complete processing results
|
|
|
|
## Google OAuth Setup
|
|
|
|
### 1. Create Google OAuth Application
|
|
|
|
1. Go to [Google Cloud Console](https://console.cloud.google.com/)
|
|
2. Create a new project or select existing one
|
|
3. Enable the Google+ API
|
|
4. Go to "Credentials" → "Create Credentials" → "OAuth 2.0 Client IDs"
|
|
5. Choose "Web application"
|
|
6. Add authorized redirect URIs:
|
|
- `http://localhost:3000/auth/google/callback` (for development)
|
|
- Your production callback URL
|
|
|
|
### 2. Configure Environment Variables
|
|
|
|
Add these to your `.env` file:
|
|
```env
|
|
GOOGLE_CLIENT_ID=your-google-oauth-client-id
|
|
GOOGLE_CLIENT_SECRET=your-google-oauth-client-secret
|
|
GOOGLE_REDIRECT_URI=http://localhost:3000/auth/google/callback
|
|
```
|
|
|
|
### 3. Frontend Integration
|
|
|
|
**Option 1: Direct Token Method**
|
|
```javascript
|
|
// Use Google's JavaScript library to get ID token
|
|
const response = await fetch('/auth/google/login-with-token', {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({ id_token: googleIdToken })
|
|
});
|
|
```
|
|
|
|
**Option 2: Authorization Code Method**
|
|
```javascript
|
|
// Redirect user to Google OAuth URL, then exchange code
|
|
const oauthUrl = await fetch('/auth/google/oauth-url').then(r => r.json());
|
|
// Redirect to oauthUrl.oauth_url
|
|
// On callback, exchange code:
|
|
const response = await fetch('/auth/google/login-with-code', {
|
|
method: 'POST',
|
|
headers: { 'Content-Type': 'application/json' },
|
|
body: JSON.stringify({
|
|
code: authorizationCode,
|
|
redirect_uri: 'http://localhost:3000/auth/google/callback'
|
|
})
|
|
});
|
|
```
|
|
|
|
## Workflow
|
|
|
|
1. **Register/Login** (Email/Password or Google OAuth) to get JWT token
|
|
2. **Upload Video** with source and target languages
|
|
3. **Transcribe** the audio from the video
|
|
4. **Translate** the transcribed text
|
|
5. **Clone Voice** and generate dubbed audio
|
|
6. **Process Video** to replace original audio with dubbed audio
|
|
7. **Download** the final dubbed video
|
|
|
|
## Environment Variables Reference
|
|
|
|
| Variable | Description | Required |
|
|
|----------|-------------|----------|
|
|
| `SECRET_KEY` | JWT secret key for authentication | Yes |
|
|
| `GOOGLE_CLIENT_ID` | Google OAuth client ID | No* |
|
|
| `GOOGLE_CLIENT_SECRET` | Google OAuth client secret | No* |
|
|
| `GOOGLE_REDIRECT_URI` | Google OAuth redirect URI | No* |
|
|
| `AWS_ACCESS_KEY_ID` | AWS access key for S3 | Yes |
|
|
| `AWS_SECRET_ACCESS_KEY` | AWS secret key for S3 | Yes |
|
|
| `AWS_REGION` | AWS region (default: us-east-1) | No |
|
|
| `S3_BUCKET_NAME` | S3 bucket name for file storage | Yes |
|
|
| `OPENAI_API_KEY` | OpenAI API key for Whisper and GPT-4 | Yes |
|
|
| `ELEVENLABS_API_KEY` | ElevenLabs API key for voice cloning | Yes |
|
|
|
|
*Required only if Google OAuth is enabled
|
|
|
|
## File Storage Structure
|
|
|
|
Files are stored in S3 with the following structure:
|
|
```
|
|
/videos/{uuid}.mp4 - Original uploaded videos
|
|
/dubbed_audio/{uuid}.mp3 - Generated dubbed audio files
|
|
/processed_videos/{uuid}.mp4 - Final processed videos
|
|
```
|
|
|
|
## Database Schema
|
|
|
|
- **users**: User accounts with email/password
|
|
- **videos**: Video metadata and processing status
|
|
- **transcriptions**: Audio transcriptions
|
|
- **translations**: Translated text
|
|
- **dubbed_audios**: Generated audio files
|
|
- **dubbed_videos**: Final processed videos
|
|
|
|
## Status Tracking
|
|
|
|
Videos have the following status values:
|
|
- `uploaded` - Video uploaded successfully
|
|
- `transcribing` - Audio transcription in progress
|
|
- `transcribed` - Transcription completed
|
|
- `translating` - Text translation in progress
|
|
- `translated` - Translation completed
|
|
- `voice_cloning` - Voice cloning and audio generation in progress
|
|
- `voice_cloned` - Dubbed audio generated
|
|
- `processing_video` - Final video processing in progress
|
|
- `completed` - All processing completed
|
|
- `*_failed` - Various failure states
|
|
|
|
## Development
|
|
|
|
### Code Linting
|
|
```bash
|
|
ruff check . --fix
|
|
```
|
|
|
|
### Project Structure
|
|
```
|
|
├── main.py # FastAPI application entry point
|
|
├── requirements.txt # Python dependencies
|
|
├── alembic.ini # Database migration configuration
|
|
├── app/
|
|
│ ├── db/ # Database configuration
|
|
│ ├── models/ # SQLAlchemy models
|
|
│ ├── routes/ # API endpoints
|
|
│ ├── services/ # Business logic and external API integrations
|
|
│ └── utils/ # Utility functions (auth, etc.)
|
|
└── alembic/
|
|
└── versions/ # Database migration files
|
|
```
|