Meta has introduced an open implementation of the generate-a-podcast feature that Google offers in its NotebookLM platform. Named NotebookLlama, this new project utilizes Meta’s own Llama models for most of its processing. Similar to NotebookLM, NotebookLlama allows users to create podcast-style digests from text files, such as PDFs of articles or blog posts.
How NotebookLlama works
NotebookLlama starts by creating a transcript from a given file—for example, a PDF. The system then adds elements like dramatization and interruptions to make the generated content feel more like a conversation. After that, it uses open text-to-speech models to convert the transcript into audio.
The current output quality of NotebookLlama’s generated podcasts is still rough compared to Google’s NotebookLM. The voices have a noticeable robotic quality, and they often talk over one another at odd times. However, Meta’s researchers point out that improving this quality is possible with stronger text-to-speech models. On NotebookLlama’s GitHub page, they note, “The text-to-speech model is the limitation of how natural this will sound.”
One possible improvement for the project, according to Meta researchers, could involve having two separate agents debate a topic and create the podcast outline, rather than relying on a single model to handle this aspect. NotebookLlama, like NotebookLM and other AI tools, also faces challenges with “hallucinations,” meaning the generated podcasts may sometimes contain incorrect information.
Features
NotebookLlama aims to provide an open-source and accessible version of NotebookLM, offering several benefits to users:
- NotebookLlama is entirely open-source, making it free for users to use, modify, and adapt as needed.
- The structured approach used in its Jupyter notebooks makes NotebookLlama suitable for those with limited experience in working with large language models (LLMs), prompting, or audio models.
- Although the core feature is converting PDFs into podcasts, the principles behind NotebookLlama could be adapted for other creative text-to-speech workflows.
Building a podcast with NotebookLlama
NotebookLlama uses Jupyter notebooks to guide users through each step of creating a podcast from a text file. Here’s a simplified look at the steps involved:
- Step 1: Install required libraries. Users begin by installing necessary libraries like Optimum, Transformers, and other dependencies.
- Step 2: Import libraries. The notebooks import several Python libraries for audio processing, such as IPython, TQDM, and Torch, among others.
- Step 3: Process data and generate audio. NotebookLlama generates audio segments using two models—Bark and Parler. These models process text prompts and output audio, which can then be assembled into a full podcast.
- Step 4: Utility functions. The process includes utility functions to generate different speaker voices, ensuring a more dynamic podcast experience.
- Step 5: Assemble the podcast. The generated audio segments are combined into the final podcast, creating a complete, shareable audio product.
NotebookLlama is still in development, and there are areas where the project can improve. Enhancing the quality of the text-to-speech models could greatly improve the natural sound of generated podcasts. Future iterations could also explore different approaches, such as using multiple agents to create more engaging content.
Despite these limitations, NotebookLlama provides a unique, open-source way to turn text into audio content. The approach may also have applications beyond simple PDF conversions, offering broader possibilities for creators interested in experimenting with automated text-to-speech workflows.
NotebookLlama could become a valuable tool for those seeking to automate podcast creation or experiment with new forms of text-to-speech content.
Featured image credit: Kerem Gülen/Ideogram