In the rapidly evolving world of large language models and generative AI, a new concept is gaining momentum: AI agents. AI Agents Interview explores.
AI agents are advanced tools designed to handle complex tasks that traditionally required human intervention. While they may be confused with robotic process automation (RPA) bots, AI agents are much more sophisticated, leveraging generative AI technology to execute tasks autonomously. Companies like Google are positioning AI agents as virtual assistants that can drive productivity across industries.
In this Q&A, Jason Gelman, Director of Product Management for Vertex AI at Google Cloud, shares insights into Google’s vision for AI agents and some of the challenges that come with this emerging technology.
AI Agents Interview
How does Google define AI agents?
Jason Gelman: An AI agent is something that acts on your behalf.
There are two key components. First, you empower the agent to act on your behalf by providing instructions and granting necessary permissions—like authentication to access systems. Second, the agent must be capable of completing tasks. This is where large language models (LLMs) come in, as they can plan out the steps to accomplish a task. What used to require human planning is now handled by the AI, including gathering information and executing various steps.
What are current use cases where AI agents can thrive?
Gelman: AI agents can be useful across a wide range of industries. Call centers are a common example where customers already expect AI support, and we’re seeing demand there. In healthcare, organizations like Mayo Clinic are using AI agents to sift through vast amounts of information, helping professionals navigate data more efficiently.
Different industries are exploring this technology in unique ways, and it’s gaining traction across many sectors.
What are some misconceptions about AI agents?
Gelman: One major misconception is that the technology is more advanced than it actually is. We’re still in the early stages, building critical infrastructure like authentication and function-calling capabilities. Right now, AI agents are more like interns—they can assist, but they’re not yet fully autonomous decision-makers.
While LLMs appear powerful, we’re still some time away from having AI agents that can handle everything independently. Developing the technology and building trust with users are key challenges.
I often compare this to driverless cars. While they might be safer than human drivers, we still roll them out cautiously. With AI agents, the risks aren’t physical, but we still need transparency, monitoring, and debugging capabilities to ensure they operate effectively.
How can enterprises balance trust in AI agents while acknowledging the technology is still evolving?
Gelman: Start simple and set clear guardrails. Build an AI agent that does one task reliably, then expand from there. Once you’ve proven the technology’s capability, you can layer in additional tasks, eventually creating a network of agents that handle multiple responsibilities.
Right now, most organizations are still in the proof-of-concept phase. Some companies are using AI agents for more complex tasks, but for critical areas like financial services or healthcare, humans remain in the loop to oversee decision-making. It will take time before we can fully hand over tasks to AI agents.
AI Agents Interview
What is the difference between Google’s AI agent and Microsoft Copilot?
Gelman: Microsoft Copilot is a product designed for business users to assist with personal tasks. Google’s approach with AI agents, particularly through Vertex AI, is more focused on API-driven, developer-based solutions that can be integrated into applications.
In essence, while Copilot serves as a visible assistant for users, Vertex AI operates behind the scenes, embedded within applications, offering greater flexibility and control for enterprise customers.
The real potential of AI agents lies in their ability to execute a wide range of tasks at the API level, without the limitations of a low-code/no-code interface.