Extension Icon

AI Voice Studio

Read and generate speech from Raycast with Alibaba Cloud Qwen-TTS, Xiaomi MiMo, and OpenAI voices, speed control, and focused provider defaults.
Overview

AI Voice Studio

AI Voice Studio is a Raycast extension for reading selected text, clipboard text, or typed text with multiple AI speech providers.

The current provider set is intentionally small:

  • Qwen-TTS through Alibaba Cloud Model Studio / DashScope, with region-aware endpoints, official language selection, instruct-model guidance, and the current system voice catalog.
  • MiMo through Xiaomi MiMo Token Plan.
  • OpenAI through the OpenAI Speech API.

The extension focuses on everyday reading and quick voice testing rather than full audio production. It provides shared quick-read commands, provider-specific voice pickers, focused setup for defaults, playback speed controls, status menu-bar items, and local verification scripts for provider routing and playback behavior.

Commands

CommandProviderPurpose
Quick ReadDefault providerReads selected text, or clipboard text if no selection is available. Running it again stops playback.
Setup Voice DefaultsAllChooses the default provider and provider-specific model, voice, speed, and advanced defaults.
Test Voice SetupDefault providerSynthesizes and plays a short sample to check credentials, latency, and playback.
Quick Read with Qwen-TTSQwen-TTSReads selected or clipboard text with the configured Qwen-TTS voice.
Read with Qwen-TTS VoiceQwen-TTSBrowses Qwen-TTS voices before reading.
Set Qwen-TTS Quick Read VoiceQwen-TTSSets the Qwen-TTS voice used by Quick Read.
Qwen-TTS Reading StatusQwen-TTSShows Qwen-TTS playback status and controls in the menu bar.
TTS StudioMiMoGenerates MiMo speech from typed, selected, or pasted text.
Quick Read with MiMoMiMoReads selected or clipboard text with the configured MiMo voice.
Read with MiMo VoiceMiMoBrowses MiMo voices before reading.
Set MiMo Quick Read VoiceMiMoSets the MiMo voice used by Quick Read.
MiMo Reading StatusMiMoShows MiMo playback status and controls in the menu bar.
OpenAI TTS StudioOpenAIGenerates OpenAI speech with voice, tone, delivery, and style controls.
Quick Read with OpenAIOpenAIReads selected or clipboard text with the configured OpenAI voice.
Read with OpenAI VoiceOpenAIBrowses OpenAI voices before reading.
Set OpenAI Quick Read VoiceOpenAISets the OpenAI voice used by Quick Read.
OpenAI Reading StatusOpenAIShows OpenAI playback status and controls in the menu bar.
Stop ReadingAllStops current playback across providers.
Increase Reading SpeedDefault providerRaises playback speed by one step.
Decrease Reading SpeedDefault providerLowers playback speed by one step.

Credentials

Configure credentials in Raycast extension preferences:

  • DashScope API Key for Qwen-TTS.
  • MiMo Token Plan API Key.
  • OpenAI API Key.

Provider defaults live in Setup Voice Defaults instead of command-level preferences, so the Raycast sidebar stays focused on credentials only.

Qwen-TTS

The Qwen-TTS integration uses Alibaba Cloud's non-realtime HTTP generation endpoint. It downloads the returned 24-hour audio URL when DashScope returns a URL instead of inline audio data, then plays chunks through the shared lookahead playback engine.

Qwen defaults to qwen3-tts-flash for low-latency everyday reading. qwen3-tts-instruct-flash is available when narration instructions are needed, including the optional DashScope optimize_instructions flag. Region presets cover Beijing and Singapore endpoints, while Custom Endpoint remains available for advanced DashScope-compatible routing.

Development

npm install
npm run verify

Useful targeted checks:

npm run verify:runtime
npm run verify:provider-settings
npm run verify:provider-contracts
npm run verify:pipeline-lookahead
npm run verify:audio-player

npm run verify runs the runtime checks, provider setting checks, provider contract mocks, live-smoke guardrails, pipeline lookahead checks, audio-player regressions, build, lint, TypeScript, audit, and local verification evidence write.

中文说明

AI Voice Studio 是一个 Raycast 语音朗读扩展,用于把选中文本、剪贴板文本或手动输入文本交给 AI TTS provider 朗读。

当前只保留三条 provider 路线:

  • Qwen-TTS:通过 Alibaba Cloud Model Studio / DashScope,支持 region-aware endpoint、官方语种选择、Instruct 模型指令控制和当前系统音色目录。
  • MiMo:通过 Xiaomi MiMo Token Plan。
  • OpenAI:通过 OpenAI Speech API。

这个扩展的目标不是做完整音频工作站,而是把日常朗读、声音选择、默认设置、速度控制和快速自检做稳。各 provider 的设置集中在 Setup Voice Defaults,Raycast 扩展偏好里只保留 API key。

开发与验证:

npm install
npm run verify
Compatibility
  • macOS
Source code