邯郸企业网站建设网站建设设置分享功能
2026/1/29 15:46:05 网站建设 项目流程
邯郸企业网站建设,网站建设设置分享功能,放单网站,网站建设需求意见征求表亮点#x1f525; Fun-CosyVoice 3.0 是基于大语言模型#xff08;LLM#xff09;的先进文本转语音#xff08;TTS#xff09;系统#xff0c;在内容一致性、说话人相似度和韵律自然度上全面超越前代#xff08;CosyVoice 2.0#xff09;。该系统专为开放场景下的零样…亮点Fun-CosyVoice 3.0是基于大语言模型LLM的先进文本转语音TTS系统在内容一致性、说话人相似度和韵律自然度上全面超越前代CosyVoice 2.0。该系统专为开放场景下的零样本多语言语音合成而设计。核心特性语言覆盖支持9种常用语言中、英、日、韩、德、西、法、意、俄及18种汉语方言/口音广东话、闽南语、四川话、东北话、陕西话、山西话、上海话、天津话、山东话、宁夏话、甘肃话等同时支持多语言/跨语言零样本音色克隆。内容一致性自然度在文本还原度、音色相似度和韵律流畅性方面达到业界领先水平。发音修复支持中文拼音和英文CMU音素的发音校正提供更强可控性满足生产级需求。文本归一化无需传统前端模块即可正确朗读数字、特殊符号及各类文本格式。双流式处理同时支持文本输入流与音频输出流在保持高质量音频输出的情况下实现最低150ms的延迟。指令控制支持语言、方言、情感、语速、音量等多种调节指令。路线图2025年12月发布Fun-CosyVoice3-0.5B-2512基础模型、强化学习模型及其训练/推理脚本发布Fun-CosyVoice3-0.5B modelscope gradio空间2025年8月感谢NVIDIA张悦铠的贡献新增了triton trtllm运行时支持以及cosyvoice2 grpo训练支持2025年7月发布Fun-CosyVoice 3.0评估集2025年5月添加CosyVoice2-0.5B vllm支持2024年12月发布25hz CosyVoice2-0.5B版本2024年9月25hz CosyVoice-300M基础模型25hz CosyVoice-300M语音转换功能2024年8月采用重复感知采样(RAS)推理提升大语言模型稳定性支持流式推理模式包括用于实时率优化的kv缓存和sdpa技术2024年7月支持流匹配训练当ttsfrd不可用时支持WeTextProcessingFastapi服务端与客户端评估ModelOpen-SourceModel Sizetest-zhCER (%) ↓test-zhSpeaker Similarity (%) ↑test-enWER (%) ↓test-enSpeaker Similarity (%) ↑test-hardCER (%) ↓test-hardSpeaker Similarity (%) ↑Human--1.2675.52.1473.4--Seed-TTS❌-1.1279.62.2576.27.5977.6MiniMax-Speech❌-0.8378.31.6569.2--F5-TTS✅0.3B1.5274.12.0064.78.6771.3Spark TTS✅0.5B1.266.01.9857.3--CosyVoice2✅0.5B1.4575.72.5765.96.8372.4FireRedTTS2✅1.5B1.1473.21.9566.5--Index-TTS2✅1.5B1.0376.52.2370.67.1275.5VibeVoice-1.5B✅1.5B1.1674.43.0468.9--VibeVoice-Realtime✅0.5B--2.0563.3--HiggsAudio-v2✅3B1.5074.02.4467.7--VoxCPM✅0.5B0.9377.21.8572.98.8773.0GLM-TTS✅1.5B1.0376.1----GLM-TTS RL✅1.5B0.8976.4----Fun-CosyVoice3-0.5B-2512✅0.5B1.2178.02.2471.86.7175.8Fun-CosyVoice3-0.5B-2512_RL✅0.5B0.8177.41.6869.55.4475.0安装克隆与安装克隆仓库git clone --recursive https://github.com/FunAudioLLM/CosyVoice.git # If you failed to clone the submodule due to network failures, please run the following command until success cd CosyVoice git submodule update --init --recursive安装 Conda请参阅 https://docs.conda.io/en/latest/miniconda.html创建 Conda 环境conda create -n cosyvoice -y python3.10 conda activate cosyvoice pip install -r requirements.txt -i https://mirrors.aliyun.com/pypi/simple/ --trusted-hostmirrors.aliyun.com # If you encounter sox compatibility issues # ubuntu sudo apt-get install sox libsox-dev # centos sudo yum install sox sox-devel模型下载fromhuggingface_hubimportsnapshot_download snapshot_download(FunAudioLLM/Fun-CosyVoice3-0.5B-2512,local_dirpretrained_models/Fun-CosyVoice3-0.5B)snapshot_download(FunAudioLLM/CosyVoice-ttsfrd,local_dirpretrained_models/CosyVoice-ttsfrd)可选地您可以解压ttsfrd资源并安装ttsfrd包以获得更好的文本规范化性能。请注意此步骤并非必需。若不安装ttsfrd包我们将默认使用wetext。cd pretrained_models/CosyVoice-ttsfrd/ unzip resource.zip -d . pip install ttsfrd_dependency-0.1-py3-none-any.whl pip install ttsfrd-0.4.2-cp310-cp310-linux_x86_64.whl基本用法importsys sys.path.append(third_party/Matcha-TTS)fromcosyvoice.cli.cosyvoiceimportAutoModelimporttorchaudio CosyVoice3 Usage, check https://funaudiollm.github.io/cosyvoice3/ for more details cosyvoiceAutoModel(model_dirpretrained_models/Fun-CosyVoice3-0.5B)# en zero_shot usagefori,jinenumerate(cosyvoice.inference_zero_shot(CosyVoice is undergoing a comprehensive upgrade, providing more accurate, stable, faster, and better voice generation capabilities.,You are a helpful assistant.|endofprompt|希望你以后能够做的比我还好呦。,./asset/zero_shot_prompt.wav,streamFalse)):torchaudio.save(zero_shot_{}.wav.format(i),j[tts_speech],cosyvoice.sample_rate)# zh zero_shot usagefori,jinenumerate(cosyvoice.inference_zero_shot(八百标兵奔北坡北坡炮兵并排跑炮兵怕把标兵碰标兵怕碰炮兵炮。,You are a helpful assistant.|endofprompt|希望你以后能够做的比我还好呦。,./asset/zero_shot_prompt.wav,streamFalse)):torchaudio.save(zero_shot_{}.wav.format(i),j[tts_speech],cosyvoice.sample_rate)# fine grained control, for supported control, check cosyvoice/tokenizer/tokenizer.py#L280fori,jinenumerate(cosyvoice.inference_cross_lingual(You are a helpful assistant.|endofprompt|[breath]因为他们那一辈人[breath]在乡里面住的要习惯一点[breath]邻居都很活络[breath]嗯都很熟悉。[breath],./asset/zero_shot_prompt.wav,streamFalse)):torchaudio.save(fine_grained_control_{}.wav.format(i),j[tts_speech],cosyvoice.sample_rate)# instruct usage, for supported control, check cosyvoice/utils/common.py#L28fori,jinenumerate(cosyvoice.inference_instruct2(好少咯一般系放嗰啲国庆啊中秋嗰啲可能会咯。,You are a helpful assistant. 请用广东话表达。|endofprompt|,./asset/zero_shot_prompt.wav,streamFalse)):torchaudio.save(instruct_{}.wav.format(i),j[tts_speech],cosyvoice.sample_rate)fori,jinenumerate(cosyvoice.inference_instruct2(收到好友从远方寄来的生日礼物那份意外的惊喜与深深的祝福让我心中充满了甜蜜的快乐笑容如花儿般绽放。,You are a helpful assistant. 请用尽可能快地语速说一句话。|endofprompt|,./asset/zero_shot_prompt.wav,streamFalse)):torchaudio.save(instruct_{}.wav.format(i),j[tts_speech],cosyvoice.sample_rate)# hotfix usagefori,jinenumerate(cosyvoice.inference_zero_shot(高管也通过电话、短信、微信等方式对报道[j][ǐ]予好评。,You are a helpful assistant.|endofprompt|希望你以后能够做的比我还好呦。,./asset/zero_shot_prompt.wav,streamFalse)):torchaudio.save(hotfix_{}.wav.format(i),j[tts_speech],cosyvoice.sample_rate)致谢我们借鉴了大量来自FunASR的代码。我们借鉴了大量来自FunCodec的代码。我们借鉴了大量来自Matcha-TTS的代码。我们借鉴了大量来自AcademiCodec的代码。我们借鉴了大量来自WeNet的代码。引用文献article{du2024cosyvoice, title{Cosyvoice: A scalable multilingual zero-shot text-to-speech synthesizer based on supervised semantic tokens}, author{Du, Zhihao and Chen, Qian and Zhang, Shiliang and Hu, Kai and Lu, Heng and Yang, Yexin and Hu, Hangrui and Zheng, Siqi and Gu, Yue and Ma, Ziyang and others}, journal{arXiv preprint arXiv:2407.05407}, year{2024} } article{du2024cosyvoice, title{Cosyvoice 2: Scalable streaming speech synthesis with large language models}, author{Du, Zhihao and Wang, Yuxuan and Chen, Qian and Shi, Xian and Lv, Xiang and Zhao, Tianyu and Gao, Zhifu and Yang, Yexin and Gao, Changfeng and Wang, Hui and others}, journal{arXiv preprint arXiv:2412.10117}, year{2024} } article{du2025cosyvoice, title{CosyVoice 3: Towards In-the-wild Speech Generation via Scaling-up and Post-training}, author{Du, Zhihao and Gao, Changfeng and Wang, Yuxuan and Yu, Fan and Zhao, Tianyu and Wang, Hao and Lv, Xiang and Wang, Hui and Shi, Xian and An, Keyu and others}, journal{arXiv preprint arXiv:2505.17589}, year{2025} } inproceedings{lyu2025build, title{Build LLM-Based Zero-Shot Streaming TTS System with Cosyvoice}, author{Lyu, Xiang and Wang, Yuxuan and Zhao, Tianyu and Wang, Hao and Liu, Huadai and Du, Zhihao}, booktitle{ICASSP 2025-2025 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)}, pages{1--2}, year{2025}, organization{IEEE} }

需要专业的网站建设服务?

联系我们获取免费的网站建设咨询和方案报价,让我们帮助您实现业务目标

立即咨询