Pydantic AI

Pydantic AI

February 3, 2025in Salesforce

The evaluation of agentic applications is most effective when integrated into the development process, rather than being an afterthought. For this to succeed, developers must be able to mock both internal and external dependencies of the agent being built. PydanticAI introduces a groundbreaking framework that supports dependency injection from the start, enabling developers to build agentic applications with an evaluation-driven approach.

An architectural parallel can be drawn to the historic Krakow Cloth Hall, a structure refined over centuries through evaluation-driven enhancements. Similarly, PydanticAI allows developers to iteratively address challenges during development, ensuring optimal outcomes.

Challenges in Developing GenAI Applications

Developers of LLM-based applications face recurring challenges, which become significant during production deployment:

Non-Determinism: Unlike conventional software APIs, identical inputs to LLMs may yield different outputs, complicating testing.
LLM Limitations: Foundational models like GPT-4, Claude, and Gemini are constrained by their training data (e.g., no access to confidential enterprise data), inability to invoke APIs or databases, and lack of reasoning capabilities.
LLM Flexibility: Applications often require different models for varying tasks (e.g., low-latency for one step, code generation for another).
Rapid Evolution: GenAI technologies evolve quickly, with foundational models now offering multimodal capabilities, structured outputs, and memory. Maintaining low-level API access is essential for leveraging these advancements.

To address non-determinism, developers must adopt evaluation-driven development, a method akin to test-driven development. This approach focuses on designing software with guardrails, real-time monitoring, and human oversight, accommodating systems that are only x% correct.

The Promise of PydanticAI

PydanticAI stands out as an agent framework that supports dependency injection, model-agnostic workflows, and evaluation-driven development. Its design is Pythonic and simplifies testing by allowing the injection of mock dependencies. For instance, in contrast to frameworks like Langchain, where dependency injection is cumbersome, PydanticAI streamlines this process, making the workflows more readable and efficient.

Building an Evaluation-Driven Application with PydanticAI

Creating an Agent: PydanticAI simplifies agent creation. For example:pythonCopy codedef default_model() -> pydantic_ai.models.Model: return GeminiModel('gemini-1.5-flash', api_key=os.getenv('GOOGLE_API_KEY')) def agent() -> pydantic_ai.Agent: return pydantic_ai.Agent(default_model()) This setup ensures flexibility by allowing different models to be assigned to specific workflow steps.
Structured Outputs: Developers can define dataclasses for structured responses, enhancing usability:pythonCopy code@dataclass class Mountain: name: str location: str height: float With PydanticAI, structured outputs are returned directly, improving the precision of agentic workflows.
Evaluation with Reference Answers: PydanticAI makes evaluation straightforward by supporting custom metrics:pythonCopy codedef evaluate(answer: Mountain, reference: Mountain) -> Tuple[float, str]: score = 0 reason = [] # Evaluation logic... return score, ';'.join(reason)
Dependency Injection: PydanticAI allows developers to inject mock services for external dependencies, facilitating efficient testing:pythonCopy code@agent.tool def get_height_of_mountain(ctx: RunContext[Tools], mountain_name: str) -> str: return ctx.deps.elev_wiki.snippet(mountain_name)

Example Use Case: Evaluating Mountain Data

By employing tools like Wikipedia as a data source, the agent can fetch accurate mountain heights during production. For testing, developers can inject mocked responses, ensuring predictable outputs and faster development cycles.

Advancing Agentic Applications with PydanticAI

PydanticAI provides the building blocks for creating scalable, evaluation-driven GenAI applications. Its support for dependency injection, structured outputs, and model-agnostic workflows addresses core challenges, empowering developers to create robust and adaptive LLM-powered systems. This paradigm shift ensures that evaluation is seamlessly embedded into the development lifecycle, paving the way for more reliable and efficient agentic applications.

wp-shannan

See Full Bio

Pydantic AI

Challenges in Developing GenAI Applications

The Promise of PydanticAI

Building an Evaluation-Driven Application with PydanticAI

Example Use Case: Evaluating Mountain Data

Advancing Agentic Applications with PydanticAI

Recent Posts

Salesforce’s Enterprise General Intelligence

How Agentic AI is Redefining Customer Service

Data-Driven Decision-Making in the Age of AI

Salesforce Achieves FedRAMP High Authorization for Agentforce

A Strategic Approach to Governing Enterprise AI Systems

Contact Us

Be in touch today — and start your business on a path to success.

Category

Archives

Pydantic AI

Pydantic AI

Challenges in Developing GenAI Applications

The Promise of PydanticAI

Building an Evaluation-Driven Application with PydanticAI

Example Use Case: Evaluating Mountain Data

Advancing Agentic Applications with PydanticAI

Related Posts

Recent Posts

Contact Us

Be in touch today — and start your business on a path to success.

Category

Tags

Archives

Subscribe to our mailing list. Join our mail list to receive our newsletter