OpenAI Transforms ChatGPT: A Voice-Enabled AI Evolution

OpenAI empowers ChatGPT with vocal capabilities for interactive discussions ChatGPT, originally conceived as a text-centric search tool, is undergoing a remarkable transformation. OpenAI has unveiled plans to imbue it with voice and image-driven functionalities.

The immensely popular generative AI assistant, which stormed onto the technology scene approximately nine months ago, has been a resounding success. Its primary function has been enabling users to craft essays, poems, and summaries from straightforward text-based prompts. However, ChatGPT is now poised to become significantly more interactive by granting users the ability to engage in spoken conversations with the AI.

This announcement coincides with Amazon’s commitment to invest a substantial sum of up to $4 billion in Anthropic, a rival of OpenAI. This move forms part of a larger battle among tech behemoths in the realm of generative AI, featuring Google’s Bard chatbot endeavoring to catch up, Meta embracing an open-source philosophy, and Microsoft closely aligning itself with OpenAI.

A Pioneering Milestone Today marks a momentous step forward in the generative AI landscape, as OpenAI merges the familiar domain of voice-driven assistants with its formidable large language models (LLMs).

For example, a user can now verbally request ChatGPT to spontaneously craft a bedtime story, offering only vocal cues to steer the narrative. Alternatively, users can simply pose questions, and ChatGPT will respond in the form of spoken words.

Moreover, ChatGPT users can also employ image-based queries. For instance, they can upload an image and ask ChatGPT to elucidate its content or provide instructions for accomplishing a particular task.

The vocal feature is underpinned by a novel text-to-speech model, capable of generating remarkably human-like voices from textual input and brief snippets of spoken speech. OpenAI collaborated with accomplished voice actors to create a roster of five distinct voices, while its open-source Whisper speech recognition system transcribes spoken words into text.

Spotify has emerged as a key launch partner in this endeavor. The music streaming giant has introduced a captivating feature for podcasters, enabling them to capture their voice and translate their content from English into Spanish, French, or German while retaining their unique vocal qualities. Notably, OpenAI is exercising caution to avoid potential criticism, restricting access to this technology. They have exclusively collaborated with podcasters including Dax Shepard, Monica Padman, Lex Fridman, Bill Simmons, and Steven Bartlett for this launch.

OpenAI stated in a blog post, “The new voice technology, capable of crafting realistic synthetic voices from just a few seconds of real speech, opens doors to many creative and accessibility-focused applications. However, these capabilities also present new risks, such as the potential for malicious actors to impersonate public figures or commit fraud.”

These new features will gradually roll out to subscribers of the Plus and Enterprise tiers over the next two weeks. To activate the voice functionalities, users should navigate to the “settings” menu within the app, access “new features,” and opt in to voice conversations. Following this, they must tap the headphone icon in the top-right corner and select their preferred voice.

Initially, voice functionality will be available as an opt-in beta exclusively on the ChatGPT Android and iOS apps, while image search will become a default feature across all platforms.