Skip to content

OpenAI Debuts DALL-E 3 API and Introduces New Text-to-Speech Models

OpenAI's DALL-E 3 API transforms text into stunning visuals, while the new Audio API breathes life into applications with natural-sounding speech.

OpenAI's DALL-E 3

In a spectacular leap forward for AI-driven creativity, OpenAI has launched the eagerly awaited DALL-E 3 API, alongside an innovative text-to-speech platform known as Audio API. These tools are set to empower developers with unprecedented capabilities to weave text into visual masterpieces and spoken words.

DALL-E 3, an artistic prodigy among text-to-image models, steps out of the shadows of ChatGPT and Bing Chat, offering an API equipped with robust moderation features to ensure ethical utilization. Aspiring AI Picassos can now transform their textual descriptions into images with resolutions as sharp as 1792×1024, starting at a mere $0.04 per image.

However, the journey of DALL-E 3 is just beginning. Currently, this fresh API iteration does not support the editing or variation creation features available with DALL-E 2, and it will automatically re-craft generation requests for enhanced safety and detail—a feature that may occasionally trade precision for protection.

In parallel, the Audio API makes its debut, offering a symphony of six preset voices that provide a naturalistic auditory experience. With prices beginning at $0.015 per 1,000 characters, developers can now infuse applications with voices that range from the informative tones of Alloy to the storytelling lilt of Fable.

OpenAI's Sam Altman enthuses about the natural quality of this audio generation, envisioning a future where applications converse seamlessly and learning languages becomes a more engaging, AI-assisted journey. Yet, while the Audio API creates speech that feels authentic, it doesn't allow for emotional fine-tuning—something developers will creatively navigate using text nuances.

Developers leveraging OpenAI's latest offerings are entrusted with a new responsibility: ensuring users are aware they're interacting with AI-generated content.

Further expanding the AI ecosystem, OpenAI also released Whisper large-v3, the latest iteration of its open source automatic speech recognition model. Boasting performance enhancements and language versatility, it's freely available on GitHub, signaling OpenAI's commitment to community-driven innovation.

With these powerful tools at their fingertips, developers are invited to push the boundaries of AI and redefine the realms of the possible in digital creativity and communication.