Volcengine Bidirectional TTS Demo

A Python demo for Volcengine (ByteDance Cloud) Text-to-Speech APIs, supporting both HTTP REST and WebSocket bidirectional streaming synthesis.

Features
Prerequisites
Installation
Configuration
- Obtaining Credentials
- Credential Reference
Usage
- HTTP TTS
- WebSocket Bidirectional TTS
Project Structure
Testing
Troubleshooting
License

Features

HTTP TTS -- Simple one-shot text-to-speech via the REST API (/api/v1/tts)
WebSocket Bidirectional TTS -- Real-time streaming synthesis over a persistent WebSocket connection (/api/v3/tts/bidirection), with character-by-character streaming and per-sentence audio output
Binary protocol layer -- Full implementation of the Volcengine binary message protocol for WebSocket communication
Environment-based configuration -- All credentials loaded from a .env file; nothing hard-coded

Prerequisites

Python 3.9 or higher
A Volcengine account with the Speech Synthesis (TTS) service activated

Installation

# Clone the repository
git clone <repository-url>
cd volcengine_bidirection_demo

# Create and activate a virtual environment (recommended)
python -m venv .venv
source .venv/bin/activate   # Linux / macOS
# .venv\Scripts\activate    # Windows

# Install dependencies
pip install -e .

This installs the project in editable mode along with all required packages:

Package	Purpose
`websockets`	WebSocket client for bidirectional TTS
`requests`	HTTP client for REST TTS
`python-dotenv`	Loads `.env` into environment variables

Configuration

# Copy the example file and fill in your real values
cp .env.example .env

Edit .env:

DOUBAO_APP_ID=your_app_id_here
DOUBAO_ACCESS_TOKEN=your_access_token_here
DOUBAO_API_KEY=your_api_key_here

Security note -- .env is listed in .gitignore and must never be committed. Only .env.example (which contains placeholders) is tracked in version control.

Obtaining Credentials

All credentials are obtained from the Volcengine Speech Console:

Sign up / Log in at console.volcengine.com.
Navigate to Voice Technology (语音技术) and activate the Speech Synthesis service.
Create an application -- after activation you will be assigned an App ID and related tokens.
Locate your credentials on the application's credential page. The console displays your App ID, Access Token (Token), and API Key after service activation.

For detailed walkthrough, see the official docs:

Credential Reference

Variable	Used by	Description
`DOUBAO_APP_ID`	WebSocket TTS	Application ID assigned when you create an app in the Volcengine console. Sent as the `X-Api-App-Key` WebSocket header to identify your application.
`DOUBAO_ACCESS_TOKEN`	WebSocket TTS	Access token (also called Token) generated per application. Sent as the `X-Api-Access-Key` WebSocket header for authentication.
`DOUBAO_API_KEY`	HTTP TTS	API key for REST endpoint authentication. Sent as the `x-api-key` HTTP header. Generated in the console under API Key Management.

How they relate to the API:

The HTTP TTS endpoint (/api/v1/tts) uses DOUBAO_API_KEY in the x-api-key request header. No App ID or Access Token is needed.
The WebSocket bidirectional TTS endpoint (/api/v3/tts/bidirection) uses DOUBAO_APP_ID and DOUBAO_ACCESS_TOKEN as WebSocket upgrade headers. A Resource ID is also required and is auto-detected from the voice type (or can be overridden via --resource_id).

Usage

HTTP TTS

Synthesizes text using a single HTTP POST request and saves the result as an MP3 file.

python tts_http.py

This runs the built-in demo text (a Chinese passage about a Three Kingdoms visualisation) and writes the output to sanguo_intro.mp3.

You can also import and call the function programmatically:

from tts_http import tts_request

audio = tts_request(
    text="你好，世界",
    voice_type="zh_female_vv_uranus_bigtts",
    encoding="mp3",
    speed_ratio=1.0,
)

with open("output.mp3", "wb") as f:
    f.write(audio)

Parameters:

Parameter	Default	Description
`text`	(required)	Text to synthesise
`voice_type`	`zh_female_vv_uranus_bigtts`	Voice identifier
`encoding`	`mp3`	Output format (`mp3`, `wav`, etc.)
`speed_ratio`	`1.0`	Playback speed multiplier
`volume_ratio`	`1.0`	Volume multiplier
`pitch_ratio`	`1.0`	Pitch multiplier

WebSocket Bidirectional TTS

Streams text character-by-character over a WebSocket connection. Each sentence (split on 。) is processed as a separate session, with its audio saved to an individual file.

python examples/volcengine/bidirection.py \
    --text "你好世界。这是一个测试。" \
    --voice_type zh_female_vv_uranus_bigtts

CLI arguments:

Argument	Default	Description
`--text`	(required)	Text to convert to speech
`--voice_type`	(required)	Voice type identifier
`--appid`	`$DOUBAO_APP_ID`	Override App ID
`--access_token`	`$DOUBAO_ACCESS_TOKEN`	Override Access Token
`--resource_id`	(auto-detected)	Resource ID (auto-selected from voice type)
`--encoding`	`mp3`	Output audio encoding
`--endpoint`	`wss://openspeech.bytedance.com/api/v3/tts/bidirection`	WebSocket endpoint

Output files are named {voice_type}_session_{index}.{encoding}, e.g. zh_female_vv_uranus_bigtts_session_0.mp3.

Project Structure

.
├── .env.example                        # Template for environment variables
├── .gitignore
├── pyproject.toml                      # Project metadata and dependencies
├── tts_http.py                         # HTTP TTS client (REST API)
├── protocols/
│   ├── __init__.py                     # Public API exports
│   └── protocols.py                    # Binary message protocol implementation
└── examples/
    └── volcengine/
        └── bidirection.py              # WebSocket bidirectional TTS example

Module / File	Description
`tts_http.py`	Standalone HTTP-based TTS client. Sends a POST request, receives base64-encoded audio.
`protocols/protocols.py`	Implements the Volcengine binary WebSocket message protocol: header serialization, event types, message marshalling/unmarshalling, and connection/session lifecycle helpers.
`examples/volcengine/bidirection.py`	End-to-end WebSocket TTS demo with argument parsing, session management, character streaming, and audio collection.

Testing

Manual Verification

Since this project is a demo client for a remote API, testing requires valid credentials.

1. Verify HTTP TTS:

python tts_http.py

Expected: prints Saved to sanguo_intro.mp3 (...) and creates a playable MP3 file.

2. Verify WebSocket TTS:

python examples/volcengine/bidirection.py \
    --text "测试语音合成。" \
    --voice_type zh_female_vv_uranus_bigtts

Expected: prints connection and session logs, creates zh_female_vv_uranus_bigtts_session_0.mp3.

3. Quick smoke test (both modes):

# HTTP -- should exit 0 and create an MP3
python tts_http.py && echo "HTTP TTS: OK"

# WebSocket -- should exit 0 and create per-sentence MP3s
python examples/volcengine/bidirection.py \
    --text "第一句。第二句。" \
    --voice_type zh_female_vv_uranus_bigtts \
  && echo "WebSocket TTS: OK"

Error Cases to Verify

Scenario	Expected behaviour
Missing `DOUBAO_API_KEY`	`tts_http.py` prints error and exits with code 1
Missing `DOUBAO_APP_ID` / token	`bidirection.py` logs error and exits gracefully
Invalid API key	HTTP response with error code and message
Invalid access token	WebSocket connection rejected or `ConnectionFailed` event

Troubleshooting

Problem	Solution
`Error: DOUBAO_API_KEY is not set`	Ensure `.env` exists in the project root with the correct key. Run from the project directory so `load_dotenv()` can find it.
`ConnectionFailed` on WebSocket	Check that `DOUBAO_APP_ID` and `DOUBAO_ACCESS_TOKEN` are correct and the TTS service is activated in your Volcengine console.
HTTP error `code != 3000`	The API returned an error. Check the printed `message` field. Common causes: invalid API key, quota exceeded, or unsupported voice type.
`No audio data received`	The server returned no audio. Verify the text is non-empty and the voice type is valid for your account.
`ModuleNotFoundError: protocols`	Run from the project root or install with `pip install -e .`

License

This project is provided as a demo. See the Volcengine Terms of Service for API usage terms.

lewislulu/doubao-audio-py

README