Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
  • AI
  • Tech
  • Cybersecurity
  • Finance
  • DeFi & Blockchain
  • Startups
  • Gaming
Dataconomy
  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI toolsNEW
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
Subscribe
No Result
View All Result
Dataconomy
No Result
View All Result

MyShell: We tried the new OpenVoice model

You can either sign up for an account or use HuggingFace to try out the tool

byKerem Gülen
January 3, 2024
in Artificial Intelligence
Home News Artificial Intelligence
Share on FacebookShare on TwitterShare on LinkedInShare on WhatsAppShare on e-mail
Google Preferred Source

MyShell, in collaboration with leading academic institutions, has unveiled OpenVoice, a groundbreaking open-source voice cloning technology, setting new standards in the field of AI-driven audio replication.

Voice cloning technology is making strides and a noteworthy advancement has been made by startups such as ElevenLabs, which have secured significant funding to advance their proprietary algorithms and AI-based software. These tools are designed to create audio replicas of human voices.

However, a cool development has emerged with OpenVoice, a collaborative creation by teams from the Massachusetts Institute of Technology (MIT), Tsinghua University in Beijing, and the Canadian AI firm MyShell. OpenVoice is an open-source platform for voice cloning, distinguished by its rapid processing and advanced customization options, setting it apart from existing voice cloning technologies.

Today, we proudly open source our OpenVoice algorithm, embracing our core ethos – AI for all.

Experience it now: https://t.co/zHJpeVpX3t. Clone voices with unparalleled precision, with granular control of tone, from emotion to accent, rhythm, pauses, and intonation, using just a… pic.twitter.com/RwmYajpxOt

— MyShell (@myshell_ai) January 2, 2024

Stay Ahead of the Curve!

Don't miss out on the latest insights, trends, and analysis in the world of data, technology, and startups. Subscribe to our newsletter and get exclusive content delivered straight to your inbox.

To enhance accessibility and transparency, the company has shared a link to its thoroughly reviewed research paper detailing the development of OpenVoice. Additionally, they’ve provided access points for users to experiment with this technology. These include the MyShell web app interface, which requires user registration, and HuggingFace, accessible to the public without any account.

MyShell is committed to contributing to the broader research community, viewing OpenVoice as just the beginning. Looking ahead, they plan to extend support through grants, datasets, and computing resources to bolster open-source research. MyShell’s guiding principle is ‘AI for All,’ emphasizing the significance of language, vision, and voice as the three key components of future Artificial General Intelligence (AGI).

In the research domain, while language and vision modalities have seen substantial developments in open-source models, there remains a gap in the voice sector. Specifically, there’s a need for a robust, instantly responsive voice cloning model that offers customizable voice generation capabilities. MyShell aims to fill this gap, pushing the boundaries of voice technology in AGI.


Meet Murf AI: Text-to-speech voiceovers in seconds


How to use Myshell AI?

Follow these steps:

  • Go to the official website of MyShell AI.
myshell openvoice
myshell openvoice
  • Click on “Start the App”
  • Select “Chat” from the left hand side.
myshell openvoice
myshell openvoice
  • In order to use “MyShell Voice Clone” feature, you need to sign up with an account. You can always use a Google account.
  • Next click on “Start,” it’s located on the bottom of the page.
  • Upload a voice recording and input the English text you want to convert to audio.
myshell openvoice
myshell openvoice
  • Hit “Generate,” this will cost 10 in-app currency.
myshell openvoice
myshell openvoice
  • Your output will be sent to you via chat.

Editor’s note: For reference, I uploaded a voice recording of my own, which says: “Voice cloning technology is making strides and a noteworthy advancement has been made by startups such as ElevenLabs.”

Then, asked for an output, which reads: “This audio file was created using MyShell AI. You be the judge of how successful it was!”

Input:

https://dataconomy.com/wp-content/uploads/2024/01/input.mp3

Output:

https://dataconomy.com/wp-content/uploads/2024/01/output.mp3

 

I wouldn’t call the output very successful but it’s amazing to see how fast it is. Add that I’m not a native speaker.


How does OpenVoice technology work?

The OpenVoice technology, developed by Qin, Wenliang Zhao, and Xumin Yu from Tsinghua University, along with Xin Sun from MyShell, is articulated in their scientific paper. This voice cloning AI is based on a dual-model architecture: a Text-to-Speech (TTS) model and a “tone converter.”

The TTS model is responsible for managing style parameters and languages. It underwent training using 30,000 sentences of audio samples, which included voices with American and British accents in English, as well as Chinese and Japanese speakers. These samples were distinctively labeled to reflect the emotions expressed in them. The model learned nuances like intonation, rhythm, and pauses from these clips.

On the other hand, the tone converter model was trained with an extensive dataset of over 300,000 audio samples from more than 20,000 different speakers.

In both models, the audio of human speech was transformed into phonemes – the basic sound units that differentiate words. These were then represented through vector embeddings.

The unique process involves using a “base speaker” in the TTS model, combined with a tone derived from a user’s recorded audio. This combination allows the models to not only reproduce the user’s voice but also modify the “tone color,” meaning the emotional expression of the spoken text.

The team included a diagram in their paper to illustrate how these two models interact:

myshell openvoice
myshell openvoice (Image credit)

They highlight that their method is conceptually straightforward yet effective. It also requires significantly fewer computing resources compared to other voice cloning methods, such as Meta’s Voicebox.

“We wanted to develop the most flexible instant voice cloning model to date. Flexibility here means flexible control over styles/emotions/accent etc, and can adapt to any language. Nobody could do this before, because it is too difficult. I lead a group of experienced AI scientists and spent several months to figure out the solution. We found that there is a very elegant way to decouple the difficult task into some doable subtasks to achieve what seems to be too difficult as a whole. The decoupled pipeline turns out to be very effective but also very simple,” Qin stated in an email reported by VentureBeat.

Tags: AIartificial intelligenceFeatured

Related Posts

Does your AI clock in without you?

Does your AI clock in without you?

June 3, 2026
Anthropic invites 150 more organizations into Project Glasswing

Anthropic invites 150 more organizations into Project Glasswing

June 3, 2026
Microsoft unveils Project Solara for an agent-first future

Microsoft unveils Project Solara for an agent-first future

June 3, 2026
OpenAI expands Codex with enterprise plug-ins and new Sites feature

OpenAI expands Codex with enterprise plug-ins and new Sites feature

June 3, 2026
Google will let websites opt out of AI search results

Google will let websites opt out of AI search results

June 3, 2026
Best AI game maker tools and guide to AI game development

Best AI game maker tools and guide to AI game development

June 2, 2026

LATEST NEWS

Why Telegram Mini Apps have become the optimal ecosystem for launching AI SaaS products

Crypto investors are watching one date closely in 2026

How Telegram Creators test post visibility before running growth campaigns

Does your AI clock in without you?

Why secure software delivery depends on better release management

Sony reveals God of War: Laufey for PS5

BEST AI MODELS LEADERBOARD

See the best AI models, ranked by intelligence, benchmark results, speed and token price. Find the most suitable LLMs, Text-to-Image, Image Editing, Text-to-Speech, Text-to-Video and Image-to-Video  artificial intelligence model for your tasks and business.

LATEST TOOLS

Veed.io

Paper Pilot

IsOn24

Magnific

DADABOTS

Rosebud AI

Prome

Pageon AI

Vyond

Centauri AI

Dataconomy

COPYRIGHT © DATACONOMY MEDIA GMBH, ALL RIGHTS RESERVED.

  • About
  • Imprint
  • Contact
  • Legal & Privacy

Follow Us

  • News
    • Artificial Intelligence
    • Cybersecurity
    • DeFi & Blockchain
    • Finance
    • Gaming
    • Startups
    • Tech
  • Industry
  • Research
  • Resources
    • Articles
    • Guides
    • Case Studies
    • Whitepapers
    • AI Models Leaderboard
  • AI tools
  • Newsletter
  • + More
    • Glossary
    • Conversations
    • Events
    • About
      • Who we are
      • Contact
      • Imprint
      • Legal & Privacy
      • Partner With Us
No Result
View All Result
Subscribe

This website uses cookies to improve your experience. You can choose to accept or reject them. Visit our Privacy Policy.