A Python demo for Volcengine (ByteDance Cloud) Text-to-Speech APIs, supporting both HTTP REST and WebSocket bidirectional streaming synthesis.
- Features
- Prerequisites
- Installation
- Configuration
- Usage
- Project Structure
- Testing
- Troubleshooting
- License
- HTTP TTS -- Simple one-shot text-to-speech via the REST API (
/api/v1/tts) - WebSocket Bidirectional TTS -- Real-time streaming synthesis over a persistent WebSocket connection (
/api/v3/tts/bidirection), with character-by-character streaming and per-sentence audio output - Binary protocol layer -- Full implementation of the Volcengine binary message protocol for WebSocket communication
- Environment-based configuration -- All credentials loaded from a
.envfile; nothing hard-coded
- Python 3.9 or higher
- A Volcengine account with the Speech Synthesis (TTS) service activated
# Clone the repository
git clone <repository-url>
cd volcengine_bidirection_demo
# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate # Linux / macOS
# .venv\Scripts\activate # Windows
# Install dependencies
pip install -e .This installs the project in editable mode along with all required packages:
| Package | Purpose |
|---|---|
websockets |
WebSocket client for bidirectional TTS |
requests |
HTTP client for REST TTS |
python-dotenv |
Loads .env into environment variables |
# Copy the example file and fill in your real values
cp .env.example .envEdit .env:
DOUBAO_APP_ID=your_app_id_here
DOUBAO_ACCESS_TOKEN=your_access_token_here
DOUBAO_API_KEY=your_api_key_hereSecurity note --
.envis listed in.gitignoreand must never be committed. Only.env.example(which contains placeholders) is tracked in version control.
All credentials are obtained from the Volcengine Speech Console:
- Sign up / Log in at console.volcengine.com.
- Navigate to Voice Technology (语音技术) and activate the Speech Synthesis service.
- Create an application -- after activation you will be assigned an App ID and related tokens.
- Locate your credentials on the application's credential page. The console displays your App ID, Access Token (Token), and API Key after service activation.
For detailed walkthrough, see the official docs:
| Variable | Used by | Description |
|---|---|---|
DOUBAO_APP_ID |
WebSocket TTS | Application ID assigned when you create an app in the Volcengine console. Sent as the X-Api-App-Key WebSocket header to identify your application. |
DOUBAO_ACCESS_TOKEN |
WebSocket TTS | Access token (also called Token) generated per application. Sent as the X-Api-Access-Key WebSocket header for authentication. |
DOUBAO_API_KEY |
HTTP TTS | API key for REST endpoint authentication. Sent as the x-api-key HTTP header. Generated in the console under API Key Management. |
How they relate to the API:
- The HTTP TTS endpoint (
/api/v1/tts) usesDOUBAO_API_KEYin thex-api-keyrequest header. No App ID or Access Token is needed. - The WebSocket bidirectional TTS endpoint (
/api/v3/tts/bidirection) usesDOUBAO_APP_IDandDOUBAO_ACCESS_TOKENas WebSocket upgrade headers. AResource IDis also required and is auto-detected from the voice type (or can be overridden via--resource_id).
Synthesizes text using a single HTTP POST request and saves the result as an MP3 file.
python tts_http.pyThis runs the built-in demo text (a Chinese passage about a Three Kingdoms visualisation) and writes the output to sanguo_intro.mp3.
You can also import and call the function programmatically:
from tts_http import tts_request
audio = tts_request(
text="你好,世界",
voice_type="zh_female_vv_uranus_bigtts",
encoding="mp3",
speed_ratio=1.0,
)
with open("output.mp3", "wb") as f:
f.write(audio)Parameters:
| Parameter | Default | Description |
|---|---|---|
text |
(required) | Text to synthesise |
voice_type |
zh_female_vv_uranus_bigtts |
Voice identifier |
encoding |
mp3 |
Output format (mp3, wav, etc.) |
speed_ratio |
1.0 |
Playback speed multiplier |
volume_ratio |
1.0 |
Volume multiplier |
pitch_ratio |
1.0 |
Pitch multiplier |
Streams text character-by-character over a WebSocket connection. Each sentence (split on 。) is processed as a separate session, with its audio saved to an individual file.
python examples/volcengine/bidirection.py \
--text "你好世界。这是一个测试。" \
--voice_type zh_female_vv_uranus_bigttsCLI arguments:
| Argument | Default | Description |
|---|---|---|
--text |
(required) | Text to convert to speech |
--voice_type |
(required) | Voice type identifier |
--appid |
$DOUBAO_APP_ID |
Override App ID |
--access_token |
$DOUBAO_ACCESS_TOKEN |
Override Access Token |
--resource_id |
(auto-detected) | Resource ID (auto-selected from voice type) |
--encoding |
mp3 |
Output audio encoding |
--endpoint |
wss://openspeech.bytedance.com/api/v3/tts/bidirection |
WebSocket endpoint |
Output files are named {voice_type}_session_{index}.{encoding}, e.g. zh_female_vv_uranus_bigtts_session_0.mp3.
.
├── .env.example # Template for environment variables
├── .gitignore
├── pyproject.toml # Project metadata and dependencies
├── tts_http.py # HTTP TTS client (REST API)
├── protocols/
│ ├── __init__.py # Public API exports
│ └── protocols.py # Binary message protocol implementation
└── examples/
└── volcengine/
└── bidirection.py # WebSocket bidirectional TTS example
| Module / File | Description |
|---|---|
tts_http.py |
Standalone HTTP-based TTS client. Sends a POST request, receives base64-encoded audio. |
protocols/protocols.py |
Implements the Volcengine binary WebSocket message protocol: header serialization, event types, message marshalling/unmarshalling, and connection/session lifecycle helpers. |
examples/volcengine/bidirection.py |
End-to-end WebSocket TTS demo with argument parsing, session management, character streaming, and audio collection. |
Since this project is a demo client for a remote API, testing requires valid credentials.
1. Verify HTTP TTS:
python tts_http.pyExpected: prints Saved to sanguo_intro.mp3 (...) and creates a playable MP3 file.
2. Verify WebSocket TTS:
python examples/volcengine/bidirection.py \
--text "测试语音合成。" \
--voice_type zh_female_vv_uranus_bigttsExpected: prints connection and session logs, creates zh_female_vv_uranus_bigtts_session_0.mp3.
3. Quick smoke test (both modes):
# HTTP -- should exit 0 and create an MP3
python tts_http.py && echo "HTTP TTS: OK"
# WebSocket -- should exit 0 and create per-sentence MP3s
python examples/volcengine/bidirection.py \
--text "第一句。第二句。" \
--voice_type zh_female_vv_uranus_bigtts \
&& echo "WebSocket TTS: OK"| Scenario | Expected behaviour |
|---|---|
Missing DOUBAO_API_KEY |
tts_http.py prints error and exits with code 1 |
Missing DOUBAO_APP_ID / token |
bidirection.py logs error and exits gracefully |
| Invalid API key | HTTP response with error code and message |
| Invalid access token | WebSocket connection rejected or ConnectionFailed event |
| Problem | Solution |
|---|---|
Error: DOUBAO_API_KEY is not set |
Ensure .env exists in the project root with the correct key. Run from the project directory so load_dotenv() can find it. |
ConnectionFailed on WebSocket |
Check that DOUBAO_APP_ID and DOUBAO_ACCESS_TOKEN are correct and the TTS service is activated in your Volcengine console. |
HTTP error code != 3000 |
The API returned an error. Check the printed message field. Common causes: invalid API key, quota exceeded, or unsupported voice type. |
No audio data received |
The server returned no audio. Verify the text is non-empty and the voice type is valid for your account. |
ModuleNotFoundError: protocols |
Run from the project root or install with pip install -e . |
This project is provided as a demo. See the Volcengine Terms of Service for API usage terms.