What is Meta's MusicGen and how to use it

Meta recently announced MusicGen, an AI-powered music generator that can turn a text description into 12 seconds of audio. The company has also open-sourced it, despite Google choosing the other way. Today, we will explain what Meta’s MusicGen is and how to use it!

The majority of artistic activities have managed to be impacted by AI, and the music business is now firmly under its influence. In the same way as ChatGPT or other huge language model-based AI create text, Meta has recently announced the availability of the open-source version of their AI model for generating music.

What is MusicGen?

MusicGen is built on a Transformer model, as are the majority of language models used today. MusicGen predicts the next segment of a piece of music in a manner similar to how a language model predicts the following letters in a phrase.

The researchers use Meta’s EnCodec audio tokenizer to break down the audio data into smaller pieces. MusicGen is a quick and effective single-stage approach that performs token processing in parallel.

For training, the crew used 20,000 hours of authorized music. They used 10,000 high-quality audio recordings from an internal dataset as well as Shutterstock and Pond5 music data, in particular.

Generative AI: The origin of the popular AI tools

MusicGen is exceptional in its capacity to handle both text and musical cues, in addition to the effectiveness of the design and the speed of creation. The text establishes the fundamental style, which the audio file’s music subsequently follows.

You cannot exactly change the melody’s direction so that you can hear it in various musical styles, for example. It is not perfectly mirrored in the output and just serves as a general guide for the generation.

There haven’t been many high-quality instances of music generation that have been made available to the public, despite the fact that many other models are running text generation, voice synthesization, created graphics, and even brief videos.

The study paper that goes along with it, which is accessible on the preprint arXiv site, claims that one of the key difficulties with music is that it necessitates running the complete frequency spectrum, which calls for more intensive sampling. Not to mention the music’s intricate compositions and overlapping instruments.

How to use MusicGen?

Through the Hugging Face API, users may test out MusicGen, though depending on how many users are using it at once, it can take some time to generate any music. For considerably quicker results, you may utilize the Hugging Face website to set up your own instance of the model. If you have the necessary skills and equipment, you may download the code and execute it manually instead.

If you are looking to try the website version, like most of the other people, here is how you do it:

Open your web browser.
Go to the Hugging Face website.
Click Spaces at the top right.
Type “MusicGen” into the search box.
Find the one that was published by Facebook.
Enter your prompt into the box at the left.
Click Generate.

How does it work?

Your description is used to generate 12 seconds of audio by the MusicGen model. A reference audio file from which a complex melody will be created can also be provided. By using the reference audio, the model will make an effort to create music that better reflects the user’s tastes by adhering to both the description and the provided melody.

We present MusicGen: A simple and controllable music generation model. MusicGen can be prompted by both text and melody.
We release code (MIT) and models (CC-BY NC) for open research, reproducibility, and for the music community: https://t.co/OkYjL4xDN7 pic.twitter.com/h1l4LGzYgf

— Felix Kreuk (@FelixKreuk) June 9, 2023

The model employs an EnCodec audio tokenizer based on a transformer language model, as stated by Kreuk. Through the Hugging Face API, users may test out MusicGen, albeit depending on how many users are using it at once, it can take some time to generate any music. For considerably quicker results, you may use the Hugging Face website to set up your own instance of the model. If you have the necessary skills and equipment, you may download the code and execute it manually instead.

MusicGen vs. Google MusicLM

MusicGen appears to produce slightly better outcomes. The researchers demonstrate this by contrasting MusicGen’s output with that of MusicLM, Riffusion, and Musai on an example website. It comes in four model sizes, ranging from tiny (300 million parameters) to big (3.3 billion parameters), with the latter having the most potential for making intricate music. It may be operated locally (a GPU with at least 16GB of RAM is suggested).

One of the numerous instances of how Meta uses AI to build immersive and compelling experiences for its consumers is MusicGen. The goal of Meta is to create a shared virtual world called the metaverse where users may interact, collaborate, and have fun on many platforms and devices. The metaverse will become more entertaining and musical thanks to MusicGen.

Meta’s MusicGen is here to start a new era in the music industry

Meta has recently announced its new AI music generator and it looks pretty impressive compared to its competitors.

Related Posts

Super Micro up 113% in 3 months analysts predict up to 108% more gains

Palantir’s valuation could plunge 50% by 2027: Here’s why

Applied AI in Product Engineering: Making Users Happy with Machine Learning

Humane AI Pin shutdown proves the hype was always fake

Want an RTX 5090? Nvidia decides who gets one, here’s how to apply

S&P 500 breaks records again while Fed stays cautious on inflation

LATEST NEWS