voiceclone-tts/README.md
2026-03-28 22:04:45 +09:00

78 lines
2.3 KiB
Markdown
Raw Permalink Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Voice Clone TTS
Type any text, hear it in your own voice. Runs fully offline.
![Screenshot](docs/assets/img/preview.png)
---
## Setup (first time only)
**1. Install system packages:**
```bash
sudo apt install portaudio19-dev python3-tk espeak-ng -y
```
**2. Install Python 3.10 via pyenv** (required on Debian to avoid lzma bug):
```bash
curl https://pyenv.run | bash
echo 'export PYENV_ROOT="$HOME/.pyenv"' >> ~/.bashrc
echo '[[ -d $PYENV_ROOT/bin ]] && export PATH="$PYENV_ROOT/bin:$PATH"' >> ~/.bashrc
echo 'eval "$(pyenv init - bash)"' >> ~/.bashrc
source ~/.bashrc
pyenv install 3.10.14
pyenv local 3.10.14
```
**3. Create a virtual environment:**
```bash
~/.pyenv/versions/3.10.14/bin/python -m venv .venv
source .venv/bin/activate
```
**4. Install Python packages** (takes 1530 min, downloads ~2GB):
```bash
pip install --upgrade pip
pip install --no-cache-dir "numpy==1.22.0"
pip install --no-cache-dir --resume-retries 20 TTS sounddevice scipy
pip install --no-cache-dir "transformers==4.40.0"
pip install --no-cache-dir "torch==2.1.0" "torchaudio==2.1.0"
```
---
## Running the app
Every time you want to use it:
```bash
source .venv/bin/activate
python voice_clone_tts.py
```
---
## How to use
1. Wait for **"Model ready"** in the top right *(first launch only: downloads ~2GB, takes 515 min)*
2. Click **Start Recording** → read the passage below for 30 seconds → **Stop Recording**
3. Type any text
4. Click **Generate & Play**
Your voice sample saves as `my_voice_sample.wav` and is reused automatically on future runs.
---
## Best text to record (Rainbow Passage)
> *"When the sunlight strikes raindrops in the air, they act as a prism and form a rainbow. The rainbow is a division of white light into many beautiful colors. These take the shape of a long round arch, with its path high above, and its two ends apparently beyond the horizon. There is, according to legend, a pot of gold at the end of the rainbow. The shape of a rainbow reminds me of a bridge. Like a bridge, a rainbow is wide in the middle and narrow at its ends."*
Read it **twice through** at your normal pace.
---
## Tips
- Record in a quiet room with no background noise
- Speak naturally — don't put on a "reading voice"
- 30 seconds of clean audio is the sweet spot
- Generation takes 1030 seconds per sentence on CPU