Whisper

Quick Start

1. Install Docker

Run the following commands on the development board to install Docker:

bash

# Download installation script
curl -fsSL https://get.docker.com -o get-docker.sh
# Install using Aliyun mirror source
sudo sh get-docker.sh --mirror Aliyun
# Start Docker and enable auto-start on boot
sudo systemctl enable docker
sudo systemctl start docker

2. Run the Project (One command, dual-mode preview)

This project supports access via Web Browser. The program automatically serves a web interface for speech recognition.

Step A: Pull Images

bash

sudo docker pull ghcr.io/Seeed-Projects/recomputer-rk-cv/rk3588-whisper:latest
sudo docker pull ghcr.io/Seeed-Projects/recomputer-rk-cv/rk3576-whisper:latest

Step B: Run with One Click

For RK3588:

bash

sudo docker run --rm --privileged --net=host \
    -e PYTHONUNBUFFERED=1 \
    -e RKNN_LOG_LEVEL=0 \
    -v /proc/device-tree/compatible:/proc/device-tree/compatible \
    ghcr.io/seeed-projects/recomputer-rk-cv/rk3588-whisper:latest \
    python3 web_service.py

Access via: http://<Board_IP>:8000

For RK3576:

bash

sudo docker run --rm --privileged --net=host \
    -e PYTHONUNBUFFERED=1 \
    -e RKNN_LOG_LEVEL=0 \
    -v /proc/device-tree/compatible:/proc/device-tree/compatible \
    ghcr.io/seeed-projects/recomputer-rk-cv/rk3576-whisper:latest \
    python3 web_service.py

Access via: http://<Board_IP>:8000

🔌 API Documentation

This project provides RESTful interfaces for ASR tasks, supporting synchronous and asynchronous transcription of audio files.

1. Synchronous Transcription Interface (Short Audio)

Endpoint: POST /api/models/whisper/predict

Suitable for audio files under 20 seconds.

Request Parameters (Multipart/Form-Data):

file: (Required) Audio file to be transcribed (e.g., .wav, .mp3).
language: (Optional) Target language code (e.g., en, zh). If different from the current model, it will hot-swap the tokenizer.

Usage Examples:

bash

curl -X POST "http://127.0.0.1:8000/api/models/whisper/predict" \
     -F "file=@/home/user/audio/test_en.wav" \
     -F "language=en"

Response Format (JSON):

json

{
  "status": "success",
  "data": {
    "text": "Hello world, this is a test.",
    "language": "en",
    "duration": 3.5,
    "inference_time": 0.8
  }
}

2. Asynchronous Transcription Interface (Long Audio)

Endpoint: POST /api/models/whisper/task

Creates an asynchronous task for processing longer audio/video files.

Usage Examples:

bash

curl -X POST "http://127.0.0.1:8000/api/models/whisper/task" \
     -F "file=@/home/user/audio/long_podcast.wav" \
     -F "language=zh"

Response Format (JSON):

json

{
  "status": "success",
  "data": {
    "task_id": "29c7b932-a77f-480c-a18b-8a958c7911c3",
    "message": "Task created successfully. Poll /api/models/whisper/task/{task_id} for status."
  }
}

3. Task Status Polling

Endpoint: GET /api/models/whisper/task/{task_id}

Usage Examples:

bash

curl "http://127.0.0.1:8000/api/models/whisper/task/29c7b932-a77f-480c-a18b-8a958c7911c3"

4. System Configuration Interface (Config)

Used to dynamically switch models and languages.

Get Current System Status

Endpoint: GET /api/system/status
Response: {"status": "success", "data": {"model_size": "base", "language": "en", "max_tokens": 12, "rknn_lite_available": true}}

Update System Configuration (Hot-Swap)

Endpoint: POST /api/system/config
Request Parameters (Form-Data): model_size=base, language=zh
Response: {"status": "success", "message": "Successfully loaded base model."}

🛠️ Developer Guide (Production Recommendations)

Code Description

web_service.py:
- Web API: Integrates FastAPI, supporting audio upload, async task queuing, and model hot-swapping.
- RKNN Inference: Encapsulates RKNN initialization for both Encoder and Decoder models. Implements autoregressive generation loop.
py_utils/whisper_utils.py:
- Audio Processing: Calculates Log-Mel spectrograms aligned with OpenAI's implementation.
- Tokenizer: Handles BPE tokenization and vocabulary management.

Modifying Models

Place the trained and converted .rknn encoder and decoder models into the model/ directory.
The service automatically loads models based on the size parameter (e.g., whisper_encoder_base_20s.rknn). Ensure file naming conventions are maintained.

Getting Started

REST API

Model Details

Quick Start

1. Install Docker

2. Run the Project (One command, dual-mode preview)

Step A: Pull Images

Step B: Run with One Click

🔌 API Documentation

1. Synchronous Transcription Interface (Short Audio)

Request Parameters (Multipart/Form-Data):

Usage Examples:

Response Format (JSON):

2. Asynchronous Transcription Interface (Long Audio)

Usage Examples:

Response Format (JSON):

3. Task Status Polling

Usage Examples:

4. System Configuration Interface (Config)

Get Current System Status

Update System Configuration (Hot-Swap)

🛠️ Developer Guide (Production Recommendations)

Code Description

Modifying Models