The evaluation of agentic applications is most effective when integrated into the development process, rather than being an afterthought. For this to succeed, developers must be able to mock both internal and external dependencies of the agent being built. PydanticAI introduces a groundbreaking framework that supports dependency injection from the start, enabling developers to build agentic applications with an evaluation-driven approach.

An architectural parallel can be drawn to the historic Krakow Cloth Hall, a structure refined over centuries through evaluation-driven enhancements. Similarly, PydanticAI allows developers to iteratively address challenges during development, ensuring optimal outcomes.

Challenges in Developing GenAI Applications

Developers of LLM-based applications face recurring challenges, which become significant during production deployment:

  1. Non-Determinism: Unlike conventional software APIs, identical inputs to LLMs may yield different outputs, complicating testing.
  2. LLM Limitations: Foundational models like GPT-4, Claude, and Gemini are constrained by their training data (e.g., no access to confidential enterprise data), inability to invoke APIs or databases, and lack of reasoning capabilities.
  3. LLM Flexibility: Applications often require different models for varying tasks (e.g., low-latency for one step, code generation for another).
  4. Rapid Evolution: GenAI technologies evolve quickly, with foundational models now offering multimodal capabilities, structured outputs, and memory. Maintaining low-level API access is essential for leveraging these advancements.

To address non-determinism, developers must adopt evaluation-driven development, a method akin to test-driven development. This approach focuses on designing software with guardrails, real-time monitoring, and human oversight, accommodating systems that are only x% correct.

The Promise of PydanticAI

PydanticAI stands out as an agent framework that supports dependency injection, model-agnostic workflows, and evaluation-driven development. Its design is Pythonic and simplifies testing by allowing the injection of mock dependencies. For instance, in contrast to frameworks like Langchain, where dependency injection is cumbersome, PydanticAI streamlines this process, making the workflows more readable and efficient.

Building an Evaluation-Driven Application with PydanticAI

  1. Creating an Agent: PydanticAI simplifies agent creation. For example:pythonCopy codedef default_model() -> pydantic_ai.models.Model: return GeminiModel('gemini-1.5-flash', api_key=os.getenv('GOOGLE_API_KEY')) def agent() -> pydantic_ai.Agent: return pydantic_ai.Agent(default_model()) This setup ensures flexibility by allowing different models to be assigned to specific workflow steps.
  2. Structured Outputs: Developers can define dataclasses for structured responses, enhancing usability:pythonCopy code@dataclass class Mountain: name: str location: str height: float With PydanticAI, structured outputs are returned directly, improving the precision of agentic workflows.
  3. Evaluation with Reference Answers: PydanticAI makes evaluation straightforward by supporting custom metrics:pythonCopy codedef evaluate(answer: Mountain, reference: Mountain) -> Tuple[float, str]: score = 0 reason = [] # Evaluation logic... return score, ';'.join(reason)
  4. Dependency Injection: PydanticAI allows developers to inject mock services for external dependencies, facilitating efficient testing:pythonCopy code@agent.tool def get_height_of_mountain(ctx: RunContext[Tools], mountain_name: str) -> str: return ctx.deps.elev_wiki.snippet(mountain_name)

Example Use Case: Evaluating Mountain Data

By employing tools like Wikipedia as a data source, the agent can fetch accurate mountain heights during production. For testing, developers can inject mocked responses, ensuring predictable outputs and faster development cycles.

Advancing Agentic Applications with PydanticAI

PydanticAI provides the building blocks for creating scalable, evaluation-driven GenAI applications. Its support for dependency injection, structured outputs, and model-agnostic workflows addresses core challenges, empowering developers to create robust and adaptive LLM-powered systems. This paradigm shift ensures that evaluation is seamlessly embedded into the development lifecycle, paving the way for more reliable and efficient agentic applications.

Related Posts
Who is Salesforce?
Salesforce

Who is Salesforce? Here is their story in their own words. From our inception, we've proudly embraced the identity of Read more

Salesforce Marketing Cloud Transactional Emails
Salesforce Marketing Cloud

Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more

Salesforce Unites Einstein Analytics with Financial CRM
Financial Services Sector

Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more

AI-Driven Propensity Scores
AI-driven propensity scores

AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more