Google unveiled its new AI-powered feature, Gemini Live, at the recent Made by Google event. This innovative tool offers users the ability to engage in voice-based conversations with an AI, powered by Google’s latest large language model.
Positioning itself as a direct competitor to OpenAI’s Advanced Voice Mode in ChatGPT, which remains in limited alpha testing, Google has taken a significant step by being the first to release this fully-developed feature. While OpenAI initially introduced a similar concept, Google’s quick follow-through surprises many.
What does Gemini Live have to offer?
Gemini Live enhances mobile AI interactions by enabling dynamic, free-flowing conversations. This feature uniquely allows users to interrupt the AI mid-response to explore specific points in greater depth or to pause and resume chats at their convenience—essentially offering a digital sidekick that’s accessible anytime.
Furthermore, Gemini Live supports hands-free operation. Users can continue their interactions with the AI even when their phone is in the background or locked, mimicking the natural flow of a traditional phone call. Starting today, this feature is available in English for Gemini Advanced subscribers on Android, with plans to extend support to iOS users and additional languages in the near future.
Users will soon enjoy new extensions such as Keep, Tasks, Utilities, and advanced features on YouTube Music. For instance, users can retrieve recipes from emails, compile shopping lists, or create nostalgic music playlists—all without the hassle of switching between applications.
Additionally, the Calendar extension will allow users to manage their schedules more efficiently. By simply taking a photo of a concert flyer, users can check their availability on that date and set reminders to purchase tickets.
Further enriching the Android ecosystem, Gemini’s deep integration provides context-aware capabilities that elevate the user experience. Users can access Gemini through a simple long press on the power button or by saying, “Hey Google.” This integration allows users to engage with the content on their screens directly, such as requesting details about a video they are watching on YouTube or asking Gemini to add restaurants from a travel vlog into Google Maps.
Google is also addressing the dual challenges of enhancing AI capabilities while ensuring speed and accuracy. New models like Gemini 1.5 Flash are being introduced to deliver faster and more reliable responses. Google plans to continue refining these aspects and expand integrations with other Google services, including Home and Messages.
Google has implemented certain restrictions with Gemini Live, as explained by Product Manager Leland Rechis. Notably, the feature won’t allow singing or the mimicry of any voices beyond the ten predefined options. This decision is likely a precaution to avoid copyright issues.
Additionally, unlike its competitor OpenAI, which emphasized emotional voice recognition during its demos, Google has opted not to prioritize the ability for Gemini Live to detect emotional nuances in user voices. This focus—or lack thereof—could be seen as a strategic divergence from OpenAI, especially considering past controversies like the incident where an OpenAI voice closely resembled actress Scarlett Johansson.
Featured image credit: Google