Lessons Learned in the First Year of Developing AI Agents
In the first year of working on AI agents, valuable insights emerged from direct collaboration with engineers and UX designers, as they iterated on the overall product experience. The objective was to create a platform for customers to use standard data analysis agents and build custom agents tailored to specific tasks and data structures relevant to their business. This platform integrates connectors to databases like Snowflake and BigQuery with built-in security, supports RAG over a metadata layer describing database contents, and facilitates data analysis through SQL, Python, and data visualization tools.
Thank you for reading this post, don't forget to subscribe!Feedback on the effectiveness of these developments came from both internal evaluations and customer insights. Users from Fortune 500 companies utilize these agents daily to analyze their internal data.
Key Insights on AI Agents
- Reasoning Over Knowledge
- A significant lesson is that reasoning capabilities are more crucial than knowledge. Echoing Sam Altman’s perspective, the focus should be on using AI as a reasoning engine rather than a database.
- For instance, in writing SQL queries, which often fail, the emphasis should be on the agent’s ability to think and resolve issues rather than getting it right on the first try. Providing context and capturing SQL errors for the agent to learn from improves performance significantly.
- Improving Performance via Agent-Computer Interface (ACI)
- The Agent-Computer Interface (ACI) is pivotal for agent performance. It involves the syntax and structure of the agent’s tool calls and the responses it receives.
- Different models (e.g., GPT-4, Claude Opus) exhibit varying behaviors, necessitating frequent iterations and tweaks to the ACI for optimal performance. Small changes in the ACI can lead to significant improvements.
- Model Limitations
- The underlying model(s) are the core of the agent. Superior decision-making models, like GPT-4, perform better in complex tasks compared to faster but less capable models like GPT-3.5-turbo.
- Understanding the failure modes and how the agent hallucinates can provide valuable insights into improving the ACI and overall agent behavior.
- Fine-Tuning Models
- Fine-tuning models for specific tasks often diminishes their reasoning ability. Agents tend to rely on learned examples rather than independent reasoning.
- Fine-tuning can still be beneficial for specific tool calls, but the primary reasoning model should remain unmodified for optimal performance.
- Avoiding Abstractions
- Using abstractions like LangChain and LlamaIndex can hinder debugging, scaling, and understanding the agent’s actions. Owning each model call allows for greater control and flexibility.
- Agents Are Not a Moat
- Building a great agent is not enough; the supporting infrastructure—security, data connectors, user interface, long-term memory, and evaluation frameworks—is critical for differentiation and success.
- Continuous Improvement of Models
- Models will continue to improve, and it’s essential to remain adaptable. Customers will expect agents to leverage the latest and most advanced models, making flexibility a competitive advantage.
Additional Insights
Further insights on code and infrastructure include:
- Starting with pgvector in Postgres for vector similarity search before moving to a vector database.
- Recognizing that open-source models currently lack robust reasoning capabilities.
- Understanding the nuances of the Assistants API and advocating for more granular control over tools like the code interpreter.
- Avoiding premature cost optimization.
- Using streaming tokens to mitigate AI latency for a better user experience.
These lessons underscore the importance of focusing on reasoning, iterative improvements to the agent-computer interface, understanding model limitations, and building robust supporting infrastructure to enhance AI agent performance and user satisfaction.