A couple of years ago, few people would have suspected that this image is anything other than a real photo. While there are mistakes, especially in the hands, it’s a very real-looking image.
OpenAI has launched a new AI image generator that is a technological step forward, and some of the examples the company shared achieve a frightening degree of verisimilitude.
Called “Images in ChatGPT”, the feature differs from DALL-E — OpenAI’s previous image generator which seems like it’s being retired — because the images come from within ChatGPT-4o.
Describing the model as a “step change”, research lead Gabriel Goh tells The Verge that GPT-4o is “omnimodal” — a model that can generate any kind of data like text, image, audio, and video.
This new type of model is indicative of a wider change in the AI industry where systems combine all types of data. Yesterday, PetaPixel reported on Google’s “Project Astra” which can see the world around it via a smartphone camera and answer questions.
Image Generation Capabilities
In a blog post revealing Images in ChatGPT, OpenAI shared some impressive examples. The pictures of an “OpenAI researcher” working on a whiteboard in a room “overlooking the Bay Bridge” with the photographer’s reflection are scarily good.
Prompt: A wide image taken with a phone of a glass whiteboard, in a room overlooking the Bay Bridge. The field of view shows a woman writing, sporting a t-shirt with a large OpenAI logo. The handwriting looks natural and a bit messy, and we see the photographer’s reflection.
Prompt: selfie view of the photographer, as she turns around to high-five him
OpenAI also shared other examples which showcase the model’s ability to generate photorealistic images.
Prompt: Generate a photorealistic image of farmer’s market in toronto on a saturday in summer 2006, it’s a beautiful late june day, people are shopping and eating sandwiches. in focus should be a young asian girl wearing denim overalls and sipping on a strawberry banana smoothie – rest can be blurred. the photo should be reminiscent of that a digital camera from 2006 would take, with a timestamp like a printed photo would have. aspect ratio should be 3:2.
Prompt: Generate a candid, Polaroid-style photograph of four diverse friends in their early 20s at a gritty dive bar. The lighting features a very harsh, direct flash, creating sharp shadows and giving the photo a very overexposed, vintage instant-camera feel. Colors should be slightly muted, evoking nostalgic, early-2000s party vibes. The aesthetic is casually emo. No border or logos or signs. There’s an interesting looking wall behind them with some light graffiti. Quality of the image should be very sharp and detailed (very little grain). The energy should be silly and chaotic. They’re either playfully grimacing, smiling, or pretending to look tough. One of them should have their friend in a silly, playful headlock. Their mouths are closed.
Prompt: Realistic photograph of a horse galloping from right to left across a vast, calm ocean surface, accurately depicting splashes, reflections, and subtle ripple patterns beneath their hooves.
Prompt: blurry old analog film photograph, picture of parked car on side street, quiet night. | Credit: Roope Rainisto
Images in ChatGPT doesn’t have a visual watermark the way DALL-E did. However, ChatGPT multimodal product lead Jackie Shannon tells The Verge that “all of our generated images will include standard C2PA metadata to mark the image as having been created by OpenAI.”
The new version of ChatGPT started rolling out yesterday (Tuesday) and will be available to people using the free and paid versions of the chatbot.
Image credits:OpenAI.