docs: add Whisper speech-to-text capability

2026-03-30 02:43:24 +08:00
parent 7ce166affb
commit 0247020c82
7 changed files with 26 additions and 0 deletions
--- a/TOOLS.md
+++ b/TOOLS.md
@@ -144,6 +144,19 @@ bash scripts/dingtalk_tts.sh "要说的内容"

 **注意**: duration 参数使用秒（整数），不是毫秒

+## 语音识别（Whisper）
+
+**命令**: `whisper <音频文件> --language Chinese --model small`
+
+**支持格式**: AMR, OGG, MP3, WAV 等（ffmpeg 支持的格式）
+
+**用途**: 用户发来的语音消息 → 用 Whisper 转文字 → 处理回复
+
+**示例**:
+```bash
+whisper /root/.openclaw/workspace-assistant/media/inbound/audio-xxx.ogg --language Chinese --model small
+```
+
 ## ⚠️ 邮件操作安全规则

 **只读不删！** 严禁执行任何删除邮件的操作，包括但不限于：
--- a/audio-1774809727190.json
+++ b/audio-1774809727190.json
@@ -0,0 +1 @@
+{"text": "\u90a3\u4f60\u770b\u770b\u8fd9\u6761\u661f\u671f\u4f60\u80fd\u4e0d\u80fd\u8bc6\u522b\u91cc\u9762\u7684\u5185\u5bb9", "segments": [{"id": 0, "seek": 0, "start": 0.0, "end": 5.0, "text": "\u90a3\u4f60\u770b\u770b\u8fd9\u6761\u661f\u671f\u4f60\u80fd\u4e0d\u80fd\u8bc6\u522b\u91cc\u9762\u7684\u5185\u5bb9", "tokens": [50364, 4184, 16529, 4200, 5562, 48837, 20682, 16786, 2166, 8225, 28590, 5233, 228, 18453, 15759, 8833, 1546, 34742, 25750, 50614], "temperature": 0.0, "avg_logprob": -0.45115266527448383, "compression_ratio": 0.8769230769230769, "no_speech_prob": 0.17235969007015228}], "language": "Chinese"}
--- a/audio-1774809727190.srt
+++ b/audio-1774809727190.srt
@@ -0,0 +1,4 @@
+1
+00:00:00,000 --> 00:00:05,000
+那你看看这条星期你能不能识别里面的内容
+
--- a/audio-1774809727190.tsv
+++ b/audio-1774809727190.tsv
@@ -0,0 +1,2 @@
+start	end	text
+0	5000	那你看看这条星期你能不能识别里面的内容
--- a/audio-1774809727190.txt
+++ b/audio-1774809727190.txt
@@ -0,0 +1 @@
+那你看看这条星期你能不能识别里面的内容
--- a/audio-1774809727190.vtt
+++ b/audio-1774809727190.vtt
@@ -0,0 +1,5 @@
+WEBVTT
+
+00:00.000 --> 00:05.000
+那你看看这条星期你能不能识别里面的内容
+
--- a/media/inbound/audio-1774809727190.amr
+++ b/media/inbound/audio-1774809727190.amr
				`@@ -0,0 +1 @@`
				{"text": "\u90a3\u4f60\u770b\u770b\u8fd9\u6761\u661f\u671f\u4f60\u80fd\u4e0d\u80fd\u8bc6\u522b\u91cc\u9762\u7684\u5185\u5bb9", "segments": [{"id": 0, "seek": 0, "start": 0.0, "end": 5.0, "text": "\u90a3\u4f60\u770b\u770b\u8fd9\u6761\u661f\u671f\u4f60\u80fd\u4e0d\u80fd\u8bc6\u522b\u91cc\u9762\u7684\u5185\u5bb9", "tokens": [50364, 4184, 16529, 4200, 5562, 48837, 20682, 16786, 2166, 8225, 28590, 5233, 228, 18453, 15759, 8833, 1546, 34742, 25750, 50614], "temperature": 0.0, "avg_logprob": -0.45115266527448383, "compression_ratio": 0.8769230769230769, "no_speech_prob": 0.17235969007015228}], "language": "Chinese"}
				`@@ -0,0 +1 @@`
				`那你看看这条星期你能不能识别里面的内容`