Google debuts new agents, content creation tools and search features powered by generative AI

Google’s vision for a world assisted by AI became more clear yesterday as the tech giant announced a wide range of updates for its generative AI capabilities across various software platforms and hardware devices. 

At its annual developer conference Google I/O, the company debuted ways to use generative AI for everything from searching online and offline worlds to creating content and performing tasks. It also announced new AI models for its Gemini family with demos for making AI faster and more efficient with Gemini 1.5 Flash and potentially more private with Gemini Nano. Other upgrades included allowing for larger amounts of information digested by AI and new ways for platforms to process video, audio, images and text.

Separately, Google debuted new ways to create and edit video through a new AI video model called Veo. It also touted ways to create AI music through its Lyria model and Music AI Sandbox, which Google created in collaboration with YouTube and major artists like Björn (from ABBA) and Wyclef Jean. While Veo will compete with rival platforms like Runway and OpenAI’s Sora, the music feature puts it up against apps like Suno AI that have become increasingly popular. 

When it comes to imaging, Google rolled out improvements for its AI image model, Imagen 3, which is available to developers in private preview mode. One improvement for images is showing text that’s actually legible instead of being distorted as unrecognizable words. However, text distortions have been one of the easier ways to identify AI-generated images in the past even if they’re not watermarked.

Google’s updates aren’t necessarily a sea change for how companies might use AI, according to Rowan Curran, a Forrester analyst focused on AI and machine learning. Instead, it shows the increased focus on improving existing-use cases with multi-modal capabilities.

“We’ve already seen over the course of this year that multi-modality has really emerged as one of the leading battlegrounds for who has had the [advantage] in the race around models at this point in time,” Curran said. “It’s very much expected to see a kind of continued evolution in this direction.”

Project Astra and AI agents 

One of the ways Google plans to scale its capabilities is through Project Astra, a new AI assistant that can answer queries through text, audio, images and video. Incorporating sight, sound and text will allow Project Astra to “understand and respond to our complex and dynamic world just like we do,” said Sir Demis Hassabis, co-founder of Deepmind, which Google acquired in 2014.

“It would need to take in and remember what it sees so [it] can understand context and take action,” Hassabis said on stage at Google I/O. “And it would have to be proactive, teachable and personal so you can talk to it naturally without lag or delay.”

In many ways, some of Project Astra’s capabilities are similar to ChatGPT’s new updates from OpenAI’s new AI model GPT-4o, which debuted a day before in an apparent attempt to upstage Google I/O. It’s also similar to what Meta debuted a few weeks ago with its update for Meta AI, which powers various Meta apps and its Meta Ray-Ban smart glasses. Many having noted similarities between the latest updates in the AI arms race and AI capabilities imagined a decade ago in director Spike Jonze’s 2013 sci-fi film “Her” starring Joaquin Phoenix and Scarlett Johansson.

Marketers will want to know how AI agents influence people, according to Geoffrey Colon, co-founder of Feelr Media, a new creative agency focused on design, production and strategy. Although it’s too early to tell how good Veo will be, it could benefit YouTube by giving creators tools for crafting cinematic video without the technical knowledge — which could bring more highly produced content for smaller devices and larger connected TVs.

By accomplishing tasks on behalf of users, Colon said Project Astra could finally fulfill what was previously promised by earlier assistants like Microsoft’s Cortana. Having previously led marketing and content teams at Microsoft and Dell, he thinks Project Astra and Google’s other AI agents should be seen not as AI but as IA: “intelligent assistants.”

“The story of AI will be less about the models themselves and all about what they can do for you,” Colon said. “And that story is all about agents: bots that don’t just talk with you but actually accomplish stuff on your behalf. Some of these agents will be ultra-simple tools for getting things done, while others will be more like collaborators and companions.”

How Google is addressing AI deepfakes, misinformation and privacy 

Google addressed concerns about AI-generated content being misused in the form of deepfakes and misinformation. For example, execs on stage announced that Google’s SynthID tool for watermarking will be expanded for use across AI-generated text and video content — including for watermarking video content from Veo.

Google execs also discussed how the company plans to improve privacy protection across its various platforms and on devices. Another way is through a new AI model called Gemini Nano, which will show up on Google Pixel devices later this year and allow people to have multi-modal generative AI capabilities on their phone instead of sending data off a device. Google is also adding ways for devices to detect fraud attempts like AI scams from video and audio deepfakes or text scams. 

Generative AI and the future of search 

Google plans to expand how it uses generative AI for search with new ways for users to interact with Google Search and new search features for Gmail, Google Photos and others apps. One way is through AI Overviews, which summarizes traditional search results. The feature, which is rolling out in the U.S. this week and then globally to 1 billion users by the end of 2024, builds on Google’s year of tests with Search Generative Experience (SGE) through Search Labs, which debuted at Google I/O 2023.

Other AI updates for search will help people find their photos, create meal plans, plan trips and break down queries into various parts of a question. However, Google is going beyond text to include ways for users to search in real-time with audio and video inputs to ask questions about the world around them. Google is grounding answers by indexing information about location, business hours and ratings to make sure place-based queries provide updated information.

Combining location data with other context from language helps improve accuracy, depending on what a person is looking for. When Yext examined the locations of more than 700,000 businesses, it found companies that had complete and accurate information online saw a 278% increase in visibility in search results. However, that also makes it more important for businesses to make sure their information online is accurate and up to date. 

As chat-based search becomes more common and more useful, it could shift some platforms from being ad-driven models to offer-driven, according to Christian Ward, Yext’s chief data officer. He thinks Google is in a strong position to switch from ads to offers, but he added that the transition won’t be easy. 

“Google is in a phenomenal position to move from an ad model to an offers engine,” said Ward. “They can even do it as an auction the way they’re already designed with ads. People bet against Google, but that’s not a great idea … Please understand this is Innovator’s Dilemma land where they’re going to be dragged into that kicking and screaming.”

Despite all its innovations laid out at Google I/O, another wild card could also cause Google to kick and scream: The pending decision from a federal judge overseeing the ongoing antitrust case. Although it’s still unclear what he might rule in the coming weeks or months, experts have said a ruling could impact Google’s search ambitions depending on the outcome.

Originally Appeared Here

Author: Rayne Chancer