diff --git a/docs/assets/openaudio.jpg b/docs/assets/openaudio.jpg
new file mode 100644
index 0000000..f23b7aa
Binary files /dev/null and b/docs/assets/openaudio.jpg differ
diff --git a/docs/assets/openaudio.png b/docs/assets/openaudio.png
new file mode 100644
index 0000000..80e300c
Binary files /dev/null and b/docs/assets/openaudio.png differ
diff --git a/docs/en/index.md b/docs/en/index.md
index 38db6db..d1baf4b 100644
--- a/docs/en/index.md
+++ b/docs/en/index.md
@@ -1,4 +1,14 @@
-# Introduction
+# OpenAudio (formerly Fish-Speech)
+
+
+
+
+
+

+
+
+
+
Advanced Text-to-Speech Model Series
-!!! warning
- We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
- This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
+
Try it now: Fish Audio Playground |
Learn more: OpenAudio Website
-## Requirements
+
-- GPU Memory: 12GB (Inference)
-- System: Linux, Windows
+---
-## Setup
+!!! warning "Legal Notice"
+ We assume no responsibility for any illegal use of the codebase. Please refer to the local laws regarding DMCA (Digital Millennium Copyright Act) and other relevant laws in your area.
+
+ **License:** This codebase is released under Apache 2.0 license and all models are released under the CC-BY-NC-SA-4.0 license.
-First, we need to create a conda environment to install the packages.
+## **Introduction**
-```bash
+We are excited to announce that we have rebranded to **OpenAudio** - introducing a brand new series of advanced Text-to-Speech models that builds upon the foundation of Fish-Speech with significant improvements and new capabilities.
-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [Video](To Be Uploaded); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
-pip install sudo apt-get install portaudio19-dev # For pyaudio
-pip install -e . # This will download all rest packages.
+**Fish-Speech v1.5**: [Video](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
-apt install libsox-dev ffmpeg # If needed.
+## **Highlights** ✨
+
+### **Emotion Control**
+OpenAudio S1 **supports a variety of emotional, tone, and special markers** to enhance speech synthesis:
+
+- **Basic emotions**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
```
-!!! warning
- The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
+- **Advanced emotions**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
-## Acknowledgements
+- **Tone markers**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **Special audio effects**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+You can also use Ha,ha,ha to control, there's many other cases waiting to be explored by yourself.
+
+### **Excellent TTS quality**
+
+We use Seed TTS Eval Metrics to evaluate the model performance, and the results show that OpenAudio S1 achieves **0.008 WER** and **0.004 CER** on English text, which is significantly better than previous models. (English, auto eval, based on OpenAI gpt-4o-transcribe, speaker distance using Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Model | Word Error Rate (WER) | Character Error Rate (CER) | Speaker Distance |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Two Type of Models**
+
+| Model | Size | Availability | Features |
+|-------|------|--------------|----------|
+| **S1** | 4B parameters | Avaliable on [fish.audio](fish.audio) | Full-featured flagship model |
+| **S1-mini** | 0.5B parameters | Avaliable on huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Distilled version with core capabilities |
+
+Both S1 and S1-mini incorporate online Reinforcement Learning from Human Feedback (RLHF).
+
+## **Features**
+
+1. **Zero-shot & Few-shot TTS:** Input a 10 to 30-second vocal sample to generate high-quality TTS output. **For detailed guidelines, see [Voice Cloning Best Practices](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
+
+2. **Multilingual & Cross-lingual Support:** Simply copy and paste multilingual text into the input box—no need to worry about the language. Currently supports English, Japanese, Korean, Chinese, French, German, Arabic, and Spanish.
+
+3. **No Phoneme Dependency:** The model has strong generalization capabilities and does not rely on phonemes for TTS. It can handle text in any language script.
+
+4. **Highly Accurate:** Achieves a low CER (Character Error Rate) of around 0.4% and WER (Word Error Rate) of around 0.8% for Seed-TTS Eval.
+
+5. **Fast:** With fish-tech acceleration, the real-time factor is approximately 1:5 on an Nvidia RTX 4060 laptop and 1:15 on an Nvidia RTX 4090.
+
+6. **WebUI Inference:** Features an easy-to-use, Gradio-based web UI compatible with Chrome, Firefox, Edge, and other browsers.
+
+7. **GUI Inference:** Offers a PyQt6 graphical interface that works seamlessly with the API server. Supports Linux, Windows, and macOS. [See GUI](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **Deploy-Friendly:** Easily set up an inference server with native support for Linux, Windows (MacOS comming soon), minimizing speed loss.
+
+## **Disclaimer**
+
+We do not hold any responsibility for any illegal usage of the codebase. Please refer to your local laws about DMCA and other related laws.
+
+## **Media & Demos**
+
+#### 🚧 Coming Soon
+Video demonstrations and tutorials are currently in development.
+
+## **Documentation**
+
+### Quick Start
+- [Build Environment](en/install.md) - Set up your development environment
+- [Inference Guide](en/inference.md) - Run the model and generate speech
+
+
+## **Community & Support**
+
+- **Discord:** Join our [Discord community](https://discord.gg/Es5qTB9BcN)
+- **Website:** Visit [OpenAudio.com](https://openaudio.com) for latest updates
+- **Try Online:** [Fish Audio Playground](https://fish.audio)
diff --git a/docs/en/inference.md b/docs/en/inference.md
index a1eb81d..4d9cf17 100644
--- a/docs/en/inference.md
+++ b/docs/en/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
--text "The text you want to convert" \
--prompt-text "Your reference text" \
--prompt-tokens "fake.npy" \
- --checkpoint-path "checkpoints/openaudio-s1-mini" \
- --num-samples 2 \
- --compile # if you want a faster speed
+ --compile
```
This command will create a `codes_N` file in the working directory, where N is an integer starting from 0.
@@ -50,15 +48,12 @@ This command will create a `codes_N` file in the working directory, where N is a
### 3. Generate vocals from semantic tokens:
-#### VQGAN Decoder
-
!!! warning "Future Warning"
We have kept the interface accessible from the original path (tools/vqgan/inference.py), but this interface may be removed in subsequent releases, so please change your code as soon as possible.
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
- --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API Inference
diff --git a/docs/en/install.md b/docs/en/install.md
new file mode 100644
index 0000000..6830156
--- /dev/null
+++ b/docs/en/install.md
@@ -0,0 +1,31 @@
+## Requirements
+
+- GPU Memory: 12GB (Inference)
+- System: Linux, WSL
+
+## Setup
+
+First you need install pyaudio and sox, which is used for audio processing.
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+
+uv sync --python 3.12
+```
+
+!!! warning
+ The `compile` option is not supported on windows and macOS, if you want to run with compile, you need to install trition by yourself.
diff --git a/docs/ja/index.md b/docs/ja/index.md
index bd937d7..bbb66c7 100644
--- a/docs/ja/index.md
+++ b/docs/ja/index.md
@@ -1,4 +1,14 @@
-# 紹介
+# OpenAudio (旧 Fish-Speech)
+
+
+
+
+
+

+
+
+
+
先進的なText-to-Speechモデルシリーズ
-!!! warning
- このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA(デジタルミレニアム著作権法)およびその他の関連法規をご参照ください。
- このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
+
今すぐ試す: Fish Audio Playground |
詳細情報: OpenAudio ウェブサイト
-## システム要件
+
-- GPU メモリ:12GB(推論)
-- システム:Linux、Windows
+---
-## セットアップ
+!!! warning "法的通知"
+ このコードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCA(デジタルミレニアム著作権法)およびその他の関連法規をご参照ください。
+
+ **ライセンス:** このコードベースはApache 2.0ライセンスの下でリリースされ、すべてのモデルはCC-BY-NC-SA-4.0ライセンスの下でリリースされています。
-まず、パッケージをインストールするためのconda環境を作成する必要があります。
+## **紹介**
-```bash
+私たちは **OpenAudio** への改名を発表できることを嬉しく思います。Fish-Speechを基盤とし、大幅な改善と新機能を加えた、新しい先進的なText-to-Speechモデルシリーズを紹介します。
-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [動画](アップロード予定); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
-pip install sudo apt-get install portaudio19-dev # pyaudio用
-pip install -e . # これにより残りのパッケージがすべてダウンロードされます。
+**Fish-Speech v1.5**: [動画](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
-apt install libsox-dev ffmpeg # 必要に応じて。
+## **ハイライト** ✨
+
+### **感情制御**
+OpenAudio S1は**多様な感情、トーン、特殊マーカーをサポート**して音声合成を強化します:
+
+- **基本感情**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
```
-!!! warning
- `compile`オプションはWindowsとmacOSでサポートされていません。compileで実行したい場合は、tritionを自分でインストールする必要があります。
+- **高度な感情**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
-## 謝辞
+- **トーンマーカー**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **特殊音響効果**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+Ha,ha,haを使用してコントロールすることもでき、他にも多くの使用法があなた自身の探索を待っています。
+
+### **優秀なTTS品質**
+
+Seed TTS評価指標を使用してモデルのパフォーマンスを評価した結果、OpenAudio S1は英語テキストで**0.008 WER**と**0.004 CER**を達成し、以前のモデルより大幅に改善されました。(英語、自動評価、OpenAI gpt-4o-転写に基づく、話者距離はRevai/pyannote-wespeaker-voxceleb-resnet34-LM使用)
+
+| モデル | 単語誤り率 (WER) | 文字誤り率 (CER) | 話者距離 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **2つのモデルタイプ**
+
+| モデル | サイズ | 利用可能性 | 特徴 |
+|-------|------|--------------|----------|
+| **S1** | 40億パラメータ | [fish.audio](fish.audio) で利用可能 | 全機能搭載のフラッグシップモデル |
+| **S1-mini** | 5億パラメータ | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) で利用可能 | コア機能を備えた蒸留版 |
+
+S1とS1-miniの両方にオンライン人間フィードバック強化学習(RLHF)が組み込まれています。
+
+## **機能**
+
+1. **ゼロショット・フューショットTTS:** 10〜30秒の音声サンプルを入力するだけで高品質なTTS出力を生成します。**詳細なガイドラインについては、[音声クローニングのベストプラクティス](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)をご覧ください。**
+
+2. **多言語・言語横断サポート:** 多言語テキストを入力ボックスにコピー&ペーストするだけで、言語を気にする必要はありません。現在、英語、日本語、韓国語、中国語、フランス語、ドイツ語、アラビア語、スペイン語をサポートしています。
+
+3. **音素依存なし:** このモデルは強力な汎化能力を持ち、TTSに音素に依存しません。あらゆる言語スクリプトのテキストを処理できます。
+
+4. **高精度:** Seed-TTS Evalで低い文字誤り率(CER)約0.4%と単語誤り率(WER)約0.8%を達成します。
+
+5. **高速:** fish-tech加速により、Nvidia RTX 4060ラップトップでリアルタイム係数約1:5、Nvidia RTX 4090で約1:15を実現します。
+
+6. **WebUI推論:** Chrome、Firefox、Edge、その他のブラウザと互換性のあるGradioベースの使いやすいWebUIを備えています。
+
+7. **GUI推論:** APIサーバーとシームレスに連携するPyQt6グラフィカルインターフェースを提供します。Linux、Windows、macOSをサポートします。[GUIを見る](https://github.com/AnyaCoder/fish-speech-gui)。
+
+8. **デプロイフレンドリー:** Linux、Windows、MacOSの native サポートで推論サーバーを簡単にセットアップし、速度低下を最小化します。
+
+## **免責事項**
+
+コードベースの違法な使用について、当方は一切の責任を負いません。お住まいの地域のDMCAやその他の関連法律をご参照ください。
+
+## **メディア・デモ**
+
+#### 🚧 近日公開
+動画デモとチュートリアルは現在開発中です。
+
+## **ドキュメント**
+
+### クイックスタート
+- [環境構築](install.md) - 開発環境をセットアップ
+- [推論ガイド](inference.md) - モデルを実行して音声を生成
+
+## **コミュニティ・サポート**
+
+- **Discord:** [Discordコミュニティ](https://discord.gg/Es5qTB9BcN)に参加
+- **ウェブサイト:** 最新アップデートは[OpenAudio.com](https://openaudio.com)をご覧ください
+- **オンライン試用:** [Fish Audio Playground](https://fish.audio)
diff --git a/docs/ja/inference.md b/docs/ja/inference.md
index db4132e..8cbde0d 100644
--- a/docs/ja/inference.md
+++ b/docs/ja/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
--text "変換したいテキスト" \
--prompt-text "参照テキスト" \
--prompt-tokens "fake.npy" \
- --checkpoint-path "checkpoints/openaudio-s1-mini" \
- --num-samples 2 \
- --compile # より高速化を求める場合
+ --compile
```
このコマンドは、作業ディレクトリに `codes_N` ファイルを作成します(Nは0から始まる整数)。
@@ -50,15 +48,12 @@ python fish_speech/models/text2semantic/inference.py \
### 3. セマンティックトークンから音声を生成:
-#### VQGANデコーダー
-
!!! warning "将来の警告"
元のパス(tools/vqgan/inference.py)からアクセス可能なインターフェースを維持していますが、このインターフェースは後続のリリースで削除される可能性があるため、できるだけ早くコードを変更してください。
```bash
python fish_speech/models/dac/inference.py \
- -i "codes_0.npy" \
- --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+ -i "codes_0.npy"
```
## HTTP API推論
@@ -103,5 +98,3 @@ python -m tools.run_webui
!!! note
`GRADIO_SHARE`、`GRADIO_SERVER_PORT`、`GRADIO_SERVER_NAME` などのGradio環境変数を使用してWebUIを設定できます。
-
-お楽しみください!
diff --git a/docs/ja/install.md b/docs/ja/install.md
new file mode 100644
index 0000000..5d815ab
--- /dev/null
+++ b/docs/ja/install.md
@@ -0,0 +1,30 @@
+## システム要件
+
+- GPU メモリ:12GB(推論)
+- システム:Linux、WSL
+
+## セットアップ
+
+まず、音声処理に使用される pyaudio と sox をインストールする必要があります。
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+ `compile` オプションは Windows と macOS でサポートされていません。compile で実行したい場合は、triton を自分でインストールする必要があります。
diff --git a/docs/ko/index.md b/docs/ko/index.md
index 612d7b8..15cf280 100644
--- a/docs/ko/index.md
+++ b/docs/ko/index.md
@@ -1,4 +1,14 @@
-# 소개
+# OpenAudio (구 Fish-Speech)
+
+
+
+
+
+

+
+
+
+
고급 텍스트-음성 변환 모델 시리즈
-!!! warning
- 코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다.
- 이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
+
지금 체험: Fish Audio Playground |
자세히 알아보기: OpenAudio 웹사이트
-## 시스템 요구사항
+
-- GPU 메모리: 12GB (추론)
-- 시스템: Linux, Windows
+---
-## 설치
+!!! warning "법적 고지"
+ 코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하의 지역의 DMCA(디지털 밀레니엄 저작권법) 및 기타 관련 법률을 참고하시기 바랍니다.
+
+ **라이선스:** 이 코드베이스는 Apache 2.0 라이선스 하에 배포되며, 모든 모델은 CC-BY-NC-SA-4.0 라이선스 하에 배포됩니다.
-먼저 패키지를 설치하기 위한 conda 환경을 만들어야 합니다.
+## **소개**
-```bash
+저희는 **OpenAudio**로의 브랜드 변경을 발표하게 되어 기쁩니다. Fish-Speech를 기반으로 하여 상당한 개선과 새로운 기능을 추가한 새로운 고급 텍스트-음성 변환 모델 시리즈를 소개합니다.
-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [동영상](업로드 예정); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
-pip install sudo apt-get install portaudio19-dev # pyaudio용
-pip install -e . # 나머지 모든 패키지를 다운로드합니다.
+**Fish-Speech v1.5**: [동영상](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
-apt install libsox-dev ffmpeg # 필요한 경우.
+## **주요 특징** ✨
+
+### **감정 제어**
+OpenAudio S1은 **다양한 감정, 톤, 특수 마커를 지원**하여 음성 합성을 향상시킵니다:
+
+- **기본 감정**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
```
-!!! warning
- `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 trition을 직접 설치해야 합니다.
+- **고급 감정**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
-## 감사의 말
+- **톤 마커**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **특수 음향 효과**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+Ha,ha,ha를 사용하여 제어할 수도 있으며, 여러분 스스로 탐구할 수 있는 다른 많은 사용법이 있습니다.
+
+### **뛰어난 TTS 품질**
+
+Seed TTS 평가 지표를 사용하여 모델 성능을 평가한 결과, OpenAudio S1은 영어 텍스트에서 **0.008 WER**과 **0.004 CER**을 달성하여 이전 모델보다 현저히 향상되었습니다. (영어, 자동 평가, OpenAI gpt-4o-전사 기반, 화자 거리는 Revai/pyannote-wespeaker-voxceleb-resnet34-LM 사용)
+
+| 모델 | 단어 오류율 (WER) | 문자 오류율 (CER) | 화자 거리 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **두 가지 모델 유형**
+
+| 모델 | 크기 | 가용성 | 특징 |
+|-------|------|--------------|----------|
+| **S1** | 40억 매개변수 | [fish.audio](fish.audio)에서 이용 가능 | 모든 기능을 갖춘 플래그십 모델 |
+| **S1-mini** | 5억 매개변수 | huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini)에서 이용 가능 | 핵심 기능을 갖춘 경량화 버전 |
+
+S1과 S1-mini 모두 온라인 인간 피드백 강화 학습(RLHF)이 통합되어 있습니다.
+
+## **기능**
+
+1. **제로샷 및 퓨샷 TTS:** 10~30초의 음성 샘플을 입력하여 고품질 TTS 출력을 생성합니다. **자세한 가이드라인은 [음성 복제 모범 사례](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)를 참조하세요.**
+
+2. **다국어 및 교차 언어 지원:** 다국어 텍스트를 입력 상자에 복사하여 붙여넣기만 하면 됩니다. 언어에 대해 걱정할 필요가 없습니다. 현재 영어, 일본어, 한국어, 중국어, 프랑스어, 독일어, 아랍어, 스페인어를 지원합니다.
+
+3. **음소 의존성 없음:** 이 모델은 강력한 일반화 능력을 가지고 있으며 TTS에 음소에 의존하지 않습니다. 어떤 언어 스크립트의 텍스트도 처리할 수 있습니다.
+
+4. **높은 정확도:** Seed-TTS Eval에서 약 0.4%의 낮은 문자 오류율(CER)과 약 0.8%의 단어 오류율(WER)을 달성합니다.
+
+5. **빠른 속도:** fish-tech 가속을 통해 Nvidia RTX 4060 노트북에서 실시간 계수 약 1:5, Nvidia RTX 4090에서 약 1:15를 달성합니다.
+
+6. **WebUI 추론:** Chrome, Firefox, Edge 및 기타 브라우저와 호환되는 사용하기 쉬운 Gradio 기반 웹 UI를 제공합니다.
+
+7. **GUI 추론:** API 서버와 원활하게 작동하는 PyQt6 그래픽 인터페이스를 제공합니다. Linux, Windows, macOS를 지원합니다. [GUI 보기](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **배포 친화적:** Linux, Windows, MacOS의 네이티브 지원으로 추론 서버를 쉽게 설정하여 속도 손실을 최소화합니다.
+
+## **면책 조항**
+
+코드베이스의 불법적인 사용에 대해서는 일체 책임을 지지 않습니다. 귀하 지역의 DMCA 및 기타 관련 법률을 참고하시기 바랍니다.
+
+## **미디어 및 데모**
+
+#### 🚧 곧 출시 예정
+동영상 데모와 튜토리얼이 현재 개발 중입니다.
+
+## **문서**
+
+### 빠른 시작
+- [환경 구축](install.md) - 개발 환경 설정
+- [추론 가이드](inference.md) - 모델 실행 및 음성 생성
+
+## **커뮤니티 및 지원**
+
+- **Discord:** [Discord 커뮤니티](https://discord.gg/Es5qTB9BcN)에 참여하세요
+- **웹사이트:** 최신 업데이트는 [OpenAudio.com](https://openaudio.com)을 방문하세요
+- **온라인 체험:** [Fish Audio Playground](https://fish.audio)
diff --git a/docs/ko/inference.md b/docs/ko/inference.md
index b32eaad..268f107 100644
--- a/docs/ko/inference.md
+++ b/docs/ko/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
--text "변환하고 싶은 텍스트" \
--prompt-text "참조 텍스트" \
--prompt-tokens "fake.npy" \
- --checkpoint-path "checkpoints/openaudio-s1-mini" \
- --num-samples 2 \
- --compile # 더 빠른 속도를 원한다면
+ --compile
```
이 명령은 작업 디렉토리에 `codes_N` 파일을 생성합니다. 여기서 N은 0부터 시작하는 정수입니다.
@@ -50,15 +48,12 @@ python fish_speech/models/text2semantic/inference.py \
### 3. 의미 토큰에서 음성 생성:
-#### VQGAN 디코더
-
!!! warning "향후 경고"
원래 경로(tools/vqgan/inference.py)에서 액세스 가능한 인터페이스를 유지하고 있지만, 이 인터페이스는 향후 릴리스에서 제거될 수 있으므로 가능한 한 빨리 코드를 변경해 주세요.
```bash
python fish_speech/models/dac/inference.py \
- -i "codes_0.npy" \
- --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+ -i "codes_0.npy"
```
## HTTP API 추론
@@ -103,5 +98,3 @@ python -m tools.run_webui
!!! note
`GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME`과 같은 Gradio 환경 변수를 사용하여 WebUI를 구성할 수 있습니다.
-
-즐기세요!
diff --git a/docs/ko/install.md b/docs/ko/install.md
new file mode 100644
index 0000000..6cddc5f
--- /dev/null
+++ b/docs/ko/install.md
@@ -0,0 +1,30 @@
+## 시스템 요구사항
+
+- GPU 메모리: 12GB (추론)
+- 시스템: Linux, WSL
+
+## 설정
+
+먼저 오디오 처리에 사용되는 pyaudio와 sox를 설치해야 합니다.
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+ `compile` 옵션은 Windows와 macOS에서 지원되지 않습니다. compile로 실행하려면 triton을 직접 설치해야 합니다.
diff --git a/docs/pt/index.md b/docs/pt/index.md
index 5477c4d..2f611ba 100644
--- a/docs/pt/index.md
+++ b/docs/pt/index.md
@@ -1,4 +1,14 @@
-# Introdução
+# OpenAudio (anteriormente Fish-Speech)
+
+
+
+
+
+

+
+
+
+
Série Avançada de Modelos Text-to-Speech
-!!! warning
- Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área.
- Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
+
Experimente agora: Fish Audio Playground |
Saiba mais: Site OpenAudio
-## Requisitos
+
-- Memória GPU: 12GB (Inferência)
-- Sistema: Linux, Windows
+---
-## Configuração
+!!! warning "Aviso Legal"
+ Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte as leis locais sobre DMCA (Digital Millennium Copyright Act) e outras leis relevantes em sua área.
+
+ **Licença:** Esta base de código é lançada sob a licença Apache 2.0 e todos os modelos são lançados sob a licença CC-BY-NC-SA-4.0.
-Primeiro, precisamos criar um ambiente conda para instalar os pacotes.
+## **Introdução**
-```bash
+Estamos empolgados em anunciar que mudamos nossa marca para **OpenAudio** - introduzindo uma nova série de modelos avançados de Text-to-Speech que se baseia na fundação do Fish-Speech com melhorias significativas e novas capacidades.
-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [Vídeo](A ser carregado); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
-pip install sudo apt-get install portaudio19-dev # Para pyaudio
-pip install -e . # Isso baixará todos os pacotes restantes.
+**Fish-Speech v1.5**: [Vídeo](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
-apt install libsox-dev ffmpeg # Se necessário.
+## **Destaques** ✨
+
+### **Controle Emocional**
+O OpenAudio S1 **suporta uma variedade de marcadores emocionais, de tom e especiais** para aprimorar a síntese de fala:
+
+- **Emoções básicas**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
```
-!!! warning
- A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o trition por conta própria.
+- **Emoções avançadas**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
-## Agradecimentos
+- **Marcadores de tom**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **Efeitos sonoros especiais**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+Você também pode usar Ha,ha,ha para controlar, há muitos outros casos esperando para serem explorados por você mesmo.
+
+### **Qualidade TTS Excelente**
+
+Utilizamos as métricas Seed TTS Eval para avaliar o desempenho do modelo, e os resultados mostram que o OpenAudio S1 alcança **0.008 WER** e **0.004 CER** em texto inglês, que é significativamente melhor que modelos anteriores. (Inglês, avaliação automática, baseada na transcrição OpenAI gpt-4o, distância do falante usando Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| Modelo | Taxa de Erro de Palavras (WER) | Taxa de Erro de Caracteres (CER) | Distância do Falante |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **Dois Tipos de Modelos**
+
+| Modelo | Tamanho | Disponibilidade | Características |
+|-------|------|--------------|----------|
+| **S1** | 4B parâmetros | Disponível em [fish.audio](fish.audio) | Modelo principal com todas as funcionalidades |
+| **S1-mini** | 0.5B parâmetros | Disponível no huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) | Versão destilada com capacidades principais |
+
+Tanto o S1 quanto o S1-mini incorporam Aprendizado por Reforço Online com Feedback Humano (RLHF).
+
+## **Características**
+
+1. **TTS Zero-shot e Few-shot:** Insira uma amostra vocal de 10 a 30 segundos para gerar saída TTS de alta qualidade. **Para diretrizes detalhadas, veja [Melhores Práticas de Clonagem de Voz](https://docs.fish.audio/text-to-speech/voice-clone-best-practices).**
+
+2. **Suporte Multilíngue e Cross-lingual:** Simplesmente copie e cole texto multilíngue na caixa de entrada—não precisa se preocupar com o idioma. Atualmente suporta inglês, japonês, coreano, chinês, francês, alemão, árabe e espanhol.
+
+3. **Sem Dependência de Fonemas:** O modelo tem fortes capacidades de generalização e não depende de fonemas para TTS. Pode lidar com texto em qualquer script de idioma.
+
+4. **Altamente Preciso:** Alcança uma baixa Taxa de Erro de Caracteres (CER) de cerca de 0,4% e Taxa de Erro de Palavras (WER) de cerca de 0,8% para Seed-TTS Eval.
+
+5. **Rápido:** Com aceleração fish-tech, o fator de tempo real é aproximadamente 1:5 em um laptop Nvidia RTX 4060 e 1:15 em um Nvidia RTX 4090.
+
+6. **Inferência WebUI:** Apresenta uma interface web fácil de usar baseada em Gradio, compatível com Chrome, Firefox, Edge e outros navegadores.
+
+7. **Inferência GUI:** Oferece uma interface gráfica PyQt6 que funciona perfeitamente com o servidor API. Suporta Linux, Windows e macOS. [Ver GUI](https://github.com/AnyaCoder/fish-speech-gui).
+
+8. **Amigável para Deploy:** Configure facilmente um servidor de inferência com suporte nativo para Linux, Windows e MacOS, minimizando a perda de velocidade.
+
+## **Isenção de Responsabilidade**
+
+Não assumimos nenhuma responsabilidade pelo uso ilegal da base de código. Consulte suas leis locais sobre DMCA e outras leis relacionadas.
+
+## **Mídia e Demos**
+
+#### 🚧 Em Breve
+Demonstrações em vídeo e tutoriais estão atualmente em desenvolvimento.
+
+## **Documentação**
+
+### Início Rápido
+- [Configurar Ambiente](install.md) - Configure seu ambiente de desenvolvimento
+- [Guia de Inferência](inference.md) - Execute o modelo e gere fala
+
+## **Comunidade e Suporte**
+
+- **Discord:** Junte-se à nossa [comunidade Discord](https://discord.gg/Es5qTB9BcN)
+- **Site:** Visite [OpenAudio.com](https://openaudio.com) para as últimas atualizações
+- **Experimente Online:** [Fish Audio Playground](https://fish.audio)
diff --git a/docs/pt/inference.md b/docs/pt/inference.md
index d8b9b7f..10b129d 100644
--- a/docs/pt/inference.md
+++ b/docs/pt/inference.md
@@ -34,9 +34,7 @@ python fish_speech/models/text2semantic/inference.py \
--text "O texto que você quer converter" \
--prompt-text "Seu texto de referência" \
--prompt-tokens "fake.npy" \
- --checkpoint-path "checkpoints/openaudio-s1-mini" \
- --num-samples 2 \
- --compile # se você quiser uma velocidade mais rápida
+ --compile
```
Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é um inteiro começando de 0.
@@ -50,15 +48,12 @@ Este comando criará um arquivo `codes_N` no diretório de trabalho, onde N é u
### 3. Gerar vocais a partir de tokens semânticos:
-#### Decodificador VQGAN
-
!!! warning "Aviso Futuro"
Mantivemos a interface acessível do caminho original (tools/vqgan/inference.py), mas esta interface pode ser removida em versões subsequentes, então por favor altere seu código o mais breve possível.
```bash
python fish_speech/models/dac/inference.py \
- -i "codes_0.npy" \
- --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
+ -i "codes_0.npy"
```
## Inferência com API HTTP
@@ -103,5 +98,3 @@ python -m tools.run_webui
!!! note
Você pode usar variáveis de ambiente do Gradio, como `GRADIO_SHARE`, `GRADIO_SERVER_PORT`, `GRADIO_SERVER_NAME` para configurar o WebUI.
-
-Divirta-se!
diff --git a/docs/pt/install.md b/docs/pt/install.md
new file mode 100644
index 0000000..005237a
--- /dev/null
+++ b/docs/pt/install.md
@@ -0,0 +1,30 @@
+## Requisitos
+
+- Memória GPU: 12GB (Inferência)
+- Sistema: Linux, WSL
+
+## Configuração
+
+Primeiro você precisa instalar pyaudio e sox, que são usados para processamento de áudio.
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+ A opção `compile` não é suportada no Windows e macOS, se você quiser executar com compile, precisa instalar o triton por conta própria.
diff --git a/docs/zh/index.md b/docs/zh/index.md
index 64e373b..bde91b5 100644
--- a/docs/zh/index.md
+++ b/docs/zh/index.md
@@ -1,4 +1,14 @@
-# 简介
+# OpenAudio (原 Fish-Speech)
+
+
+
+
+
+

+
+
+
+
先进的文字转语音模型系列
-!!! warning
- 我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA(数字千年版权法)和其他相关法律的规定。
- 此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
+
立即试用: Fish Audio Playground |
了解更多: OpenAudio 网站
-## 系统要求
+
-- GPU 内存:12GB(推理)
-- 系统:Linux、Windows
+---
-## 安装
+!!! warning "法律声明"
+ 我们不对代码库的任何非法使用承担责任。请参考您所在地区有关 DMCA(数字千年版权法)和其他相关法律的规定。
+
+ **许可证:** 此代码库在 Apache 2.0 许可证下发布,所有模型在 CC-BY-NC-SA-4.0 许可证下发布。
-首先,我们需要创建一个 conda 环境来安装包。
+## **介绍**
-```bash
+我们很高兴地宣布,我们已经更名为 **OpenAudio** - 推出全新的先进文字转语音模型系列,在 Fish-Speech 的基础上进行了重大改进并增加了新功能。
-conda create -n fish-speech python=3.12
-conda activate fish-speech
+**Openaudio-S1-mini**: [视频](即将上传); [Hugging Face](https://huggingface.co/fishaudio/openaudio-s1-mini);
-pip install sudo apt-get install portaudio19-dev # 用于 pyaudio
-pip install -e . # 这将下载所有其余的包。
+**Fish-Speech v1.5**: [视频](https://www.bilibili.com/video/BV1EKiDYBE4o/); [Hugging Face](https://huggingface.co/fishaudio/fish-speech-1.5);
-apt install libsox-dev ffmpeg # 如果需要的话。
+## **亮点** ✨
+
+### **情感控制**
+OpenAudio S1 **支持多种情感、语调和特殊标记**来增强语音合成效果:
+
+- **基础情感**:
+```
+(angry) (sad) (excited) (surprised) (satisfied) (delighted)
+(scared) (worried) (upset) (nervous) (frustrated) (depressed)
+(empathetic) (embarrassed) (disgusted) (moved) (proud) (relaxed)
+(grateful) (confident) (interested) (curious) (confused) (joyful)
```
-!!! warning
- `compile` 选项在 Windows 和 macOS 上不受支持,如果您想使用 compile 运行,需要自己安装 trition。
+- **高级情感**:
+```
+(disdainful) (unhappy) (anxious) (hysterical) (indifferent)
+(impatient) (guilty) (scornful) (panicked) (furious) (reluctant)
+(keen) (disapproving) (negative) (denying) (astonished) (serious)
+(sarcastic) (conciliative) (comforting) (sincere) (sneering)
+(hesitating) (yielding) (painful) (awkward) (amused)
+```
-## 致谢
+- **语调标记**:
+```
+(in a hurry tone) (shouting) (screaming) (whispering) (soft tone)
+```
-- [VITS2 (daniilrobnikov)](https://github.com/daniilrobnikov/vits2)
-- [Bert-VITS2](https://github.com/fishaudio/Bert-VITS2)
-- [GPT VITS](https://github.com/innnky/gpt-vits)
-- [MQTTS](https://github.com/b04901014/MQTTS)
-- [GPT Fast](https://github.com/pytorch-labs/gpt-fast)
-- [Transformers](https://github.com/huggingface/transformers)
-- [GPT-SoVITS](https://github.com/RVC-Boss/GPT-SoVITS)
+- **特殊音效**:
+```
+(laughing) (chuckling) (sobbing) (crying loudly) (sighing) (panting)
+(groaning) (crowd laughing) (background laughter) (audience laughing)
+```
+
+您还可以使用 Ha,ha,ha 来控制,还有许多其他用法等待您自己探索。
+
+### **卓越的 TTS 质量**
+
+我们使用 Seed TTS 评估指标来评估模型性能,结果显示 OpenAudio S1 在英文文本上达到了 **0.008 WER** 和 **0.004 CER**,明显优于以前的模型。(英语,自动评估,基于 OpenAI gpt-4o-转录,说话人距离使用 Revai/pyannote-wespeaker-voxceleb-resnet34-LM)
+
+| 模型 | 词错误率 (WER) | 字符错误率 (CER) | 说话人距离 |
+|-------|----------------------|---------------------------|------------------|
+| **S1** | **0.008** | **0.004** | **0.332** |
+| **S1-mini** | **0.011** | **0.005** | **0.380** |
+
+### **两种模型类型**
+
+| 模型 | 规模 | 可用性 | 特性 |
+|-------|------|--------------|----------|
+| **S1** | 40亿参数 | 在 [fish.audio](fish.audio) 上可用 | 功能齐全的旗舰模型 |
+| **S1-mini** | 5亿参数 | 在 huggingface [hf space](https://huggingface.co/spaces/fishaudio/openaudio-s1-mini) 上可用 | 具有核心功能的蒸馏版本 |
+
+S1 和 S1-mini 都集成了在线人类反馈强化学习 (RLHF)。
+
+## **功能特性**
+
+1. **零样本和少样本 TTS:** 输入 10 到 30 秒的语音样本即可生成高质量的 TTS 输出。**详细指南请参见 [语音克隆最佳实践](https://docs.fish.audio/text-to-speech/voice-clone-best-practices)。**
+
+2. **多语言和跨语言支持:** 只需复制粘贴多语言文本到输入框即可——无需担心语言问题。目前支持英语、日语、韩语、中文、法语、德语、阿拉伯语和西班牙语。
+
+3. **无音素依赖:** 该模型具有强大的泛化能力,不依赖音素进行 TTS。它可以处理任何语言文字的文本。
+
+4. **高度准确:** 在 Seed-TTS Eval 中实现低字符错误率 (CER) 约 0.4% 和词错误率 (WER) 约 0.8%。
+
+5. **快速:** 通过 fish-tech 加速,在 Nvidia RTX 4060 笔记本电脑上实时因子约为 1:5,在 Nvidia RTX 4090 上约为 1:15。
+
+6. **WebUI 推理:** 具有易于使用的基于 Gradio 的网络界面,兼容 Chrome、Firefox、Edge 和其他浏览器。
+
+7. **GUI 推理:** 提供与 API 服务器无缝配合的 PyQt6 图形界面。支持 Linux、Windows 和 macOS。[查看 GUI](https://github.com/AnyaCoder/fish-speech-gui)。
+
+8. **部署友好:** 轻松设置推理服务器,原生支持 Linux、Windows 和 MacOS,最小化速度损失。
+
+## **免责声明**
+
+我们不对代码库的任何非法使用承担责任。请参考您当地关于 DMCA 和其他相关法律的规定。
+
+## **媒体和演示**
+
+#### 🚧 即将推出
+视频演示和教程正在开发中。
+
+## **文档**
+
+### 快速开始
+- [构建环境](install.md) - 设置您的开发环境
+- [推理指南](inference.md) - 运行模型并生成语音
+
+## **社区和支持**
+
+- **Discord:** 加入我们的 [Discord 社区](https://discord.gg/Es5qTB9BcN)
+- **网站:** 访问 [OpenAudio.com](https://openaudio.com) 获取最新更新
+- **在线试用:** [Fish Audio Playground](https://fish.audio)
diff --git a/docs/zh/inference.md b/docs/zh/inference.md
index 50ac2a7..de821ad 100644
--- a/docs/zh/inference.md
+++ b/docs/zh/inference.md
@@ -1,6 +1,6 @@
# 推理
-由于声码器模型已更改,您需要比以前更多的显存,建议使用12GB显存以便流畅推理。
+由于声码器模型已更改,您需要比以前更多的 VRAM,建议使用 12GB 进行流畅推理。
我们支持命令行、HTTP API 和 WebUI 进行推理,您可以选择任何您喜欢的方法。
@@ -17,7 +17,7 @@ huggingface-cli download fishaudio/openaudio-s1-mini --local-dir checkpoints/ope
!!! note
如果您计划让模型随机选择音色,可以跳过此步骤。
-### 1. 从参考音频获取VQ tokens
+### 1. 从参考音频获取 VQ 令牌
```bash
python fish_speech/models/dac/inference.py \
@@ -27,38 +27,33 @@ python fish_speech/models/dac/inference.py \
您应该会得到一个 `fake.npy` 和一个 `fake.wav`。
-### 2. 从文本生成语义tokens:
+### 2. 从文本生成语义令牌:
```bash
python fish_speech/models/text2semantic/inference.py \
--text "您想要转换的文本" \
--prompt-text "您的参考文本" \
--prompt-tokens "fake.npy" \
- --checkpoint-path "checkpoints/openaudio-s1-mini" \
- --num-samples 2 \
- --compile # 如果您想要更快的速度
+ --compile
```
-此命令将在工作目录中创建一个 `codes_N` 文件,其中N是从0开始的整数。
+此命令将在工作目录中创建一个 `codes_N` 文件,其中 N 是从 0 开始的整数。
!!! note
- 您可能想要使用 `--compile` 来融合CUDA内核以获得更快的推理速度(约30 tokens/秒 -> 约500 tokens/秒)。
- 相应地,如果您不打算使用加速,可以删除 `--compile` 参数的注释。
+ 您可能希望使用 `--compile` 来融合 CUDA 内核以实现更快的推理(~30 令牌/秒 -> ~500 令牌/秒)。
+ 相应地,如果您不计划使用加速,可以注释掉 `--compile` 参数。
!!! info
- 对于不支持bf16的GPU,您可能需要使用 `--half` 参数。
+ 对于不支持 bf16 的 GPU,您可能需要使用 `--half` 参数。
-### 3. 从语义tokens生成人声:
-
-#### VQGAN 解码器
+### 3. 从语义令牌生成声音:
!!! warning "未来警告"
- 我们保留了从原始路径(tools/vqgan/inference.py)访问的接口,但此接口可能在后续版本中被移除,请尽快更改您的代码。
+ 我们保留了从原始路径(tools/vqgan/inference.py)访问接口的能力,但此接口可能在后续版本中被删除,因此请尽快更改您的代码。
```bash
python fish_speech/models/dac/inference.py \
-i "codes_0.npy" \
- --checkpoint-path "checkpoints/openaudiio-s1-mini/codec.pth"
```
## HTTP API 推理
diff --git a/docs/zh/install.md b/docs/zh/install.md
new file mode 100644
index 0000000..be82665
--- /dev/null
+++ b/docs/zh/install.md
@@ -0,0 +1,30 @@
+## 系统要求
+
+- GPU 内存:12GB(推理)
+- 系统:Linux、WSL
+
+## 安装
+
+首先需要安装 pyaudio 和 sox,用于音频处理。
+
+``` bash
+apt install portaudio19-dev libsox-dev ffmpeg
+```
+
+### Conda
+
+```bash
+conda create -n fish-speech python=3.12
+conda activate fish-speech
+
+pip install -e .
+```
+
+### UV
+
+```bash
+uv sync --python 3.12
+```
+
+!!! warning
+ `compile` 选项在 Windows 和 macOS 上不受支持,如果您想使用 compile 运行,需要自己安装 triton。
diff --git a/mkdocs.yml b/mkdocs.yml
index 214c4e3..f2f62f9 100644
--- a/mkdocs.yml
+++ b/mkdocs.yml
@@ -1,4 +1,4 @@
-site_name: Fish Speech
+site_name: OpenAudio
site_description: Targeting SOTA TTS solutions.
site_url: https://speech.fish.audio
@@ -12,7 +12,7 @@ copyright: Copyright © 2023-2025 by Fish Audio
theme:
name: material
- favicon: assets/figs/logo-circle.png
+ favicon: assets/openaudio.png
language: en
features:
- content.action.edit
@@ -25,8 +25,7 @@ theme:
- search.highlight
- search.share
- content.code.copy
- icon:
- logo: fontawesome/solid/fish
+ logo: assets/openaudio.png
palette:
# Palette toggle for automatic mode
@@ -56,7 +55,8 @@ theme:
code: Roboto Mono
nav:
- - Installation: en/index.md
+ - Introduction: en/index.md
+ - Installation: en/install.md
- Inference: en/inference.md
# Plugins
@@ -80,25 +80,29 @@ plugins:
name: 简体中文
build: true
nav:
- - 安装: zh/index.md
+ - 介绍: zh/index.md
+ - 安装: zh/install.md
- 推理: zh/inference.md
- locale: ja
name: 日本語
build: true
nav:
- - インストール: ja/index.md
+ - はじめに: ja/index.md
+ - インストール: ja/install.md
- 推論: ja/inference.md
- locale: pt
name: Português (Brasil)
build: true
nav:
- - Instalação: pt/index.md
+ - Introdução: pt/index.md
+ - Instalação: pt/install.md
- Inferência: pt/inference.md
- locale: ko
name: 한국어
build: true
nav:
- - 설치: ko/index.md
+ - 소개: ko/index.md
+ - 설치: ko/install.md
- 추론: ko/inference.md
markdown_extensions: