Function Calling Archives - gettectonic.com
From Chatbots to Agentic AI

From Chatbots to Agentic AI

The transition from LLM-powered chatbots to agentic systems, or agentic AI, can be summed up by the old saying: “Less talk, more action.” Keeping up with advancements in AI can be overwhelming, especially when managing an existing business. The speed and complexity of innovation can make it feel like the first day of school all over again. This insight offers a comprehensive look at AI agents, their components, and key characteristics. The introductory section breaks down the elements that form the term “AI agent,” providing a clear definition. After establishing this foundation, we explore the evolution of LLM applications, particularly the shift from traditional chatbots to agentic systems. The goal is to understand why AI agents are becoming increasingly vital in AI development and how they differ from LLM-powered chatbots. By the end of this guide, you will have a deeper understanding of AI agents, their potential applications, and their impact on organizational workflows. For those of you with a technical background who prefer to get hands-on, click here for the best repository for AI developers and builders. What is an AI Agent? Components of AI Agents To understand the term “AI agent,” we need to examine its two main components. First, let’s consider artificial intelligence, or AI. Artificial Intelligence (AI) refers to non-biological intelligence that mimics human cognition to perform tasks traditionally requiring human intellect. Through machine learning and deep learning techniques, algorithms—especially neural networks—learn patterns from data. AI systems are used for tasks such as detection, classification, and prediction, with content generation becoming a prominent domain due to transformer-based models. These systems can match or exceed human performance in specific scenarios. The second component is “agent,” a term commonly used in both technology and human contexts. In computer science, an agent refers to a software entity with environmental awareness, able to perceive and act within its surroundings. A computational agent typically has the ability to: In human contexts, an agent is someone who acts on behalf of another person or organization, making decisions, gathering information, and facilitating interactions. They often play intermediary roles in transactions and decision-making. To define an AI agent, we combine these two perspectives: it is a computational entity with environmental awareness, capable of perceiving inputs, acting with tools, and processing information using foundation models backed by both long-term and short-term memory. Key Components and Characteristics of AI Agents From LLMs to AI Agents Now, let’s take a step back and understand how we arrived at the concept of AI agents, particularly by looking at how LLM applications have evolved. The shift from traditional chatbots to LLM-powered applications has been rapid and transformative. Form Factor Evolution of LLM Applications Traditional Chatbots to LLM-Powered Chatbots Traditional chatbots, which existed before generative AI, were simpler and relied on heuristic responses: “If this, then that.” They followed predefined rules and decision trees to generate responses. These systems had limited interactivity, with the fallback option of “Speak to a human” for complex scenarios. LLM-Powered Chatbots The release of OpenAI’s ChatGPT on November 30, 2022, marked the introduction of LLM-powered chatbots, fundamentally changing the game. These chatbots, like ChatGPT, were built on GPT-3.5, a large language model trained on massive datasets. Unlike traditional chatbots, LLM-powered systems can generate human-like responses, offering a much more flexible and intelligent interaction. However, challenges remained. LLM-powered chatbots struggled with personalization and consistency, often generating plausible but incorrect information—a phenomenon known as “hallucination.” This led to efforts in grounding LLM responses through techniques like retrieval-augmented generation (RAG). RAG Chatbots RAG is a method that combines data retrieval with LLM generation, allowing systems to access real-time or proprietary data, improving accuracy and relevance. This hybrid approach addresses the hallucination problem, ensuring more reliable outputs. LLM-Powered Chatbots to AI Agents As LLMs expanded, their abilities grew more sophisticated, incorporating advanced reasoning, multi-step planning, and the use of external tools (function calling). Tool use refers to an LLM’s ability to invoke specific functions, enabling it to perform more complex tasks. Tool-Augmented LLMs and AI Agents As LLMs became tool-augmented, the emergence of AI agents followed. These agents integrate reasoning, planning, and tool use into an autonomous, goal-driven system that can operate iteratively within a dynamic environment. Unlike traditional chatbot interfaces, AI agents leverage a broader set of tools to interact with various systems and accomplish tasks. Agentic Systems Agentic systems—computational architectures that include AI agents—embody these advanced capabilities. They can autonomously interact with systems, make decisions, and adapt to feedback, forming the foundation for more complex AI applications. Components of an AI Agent AI agents consist of several key components: Characteristics of AI Agents AI agents are defined by the following traits: Conclusion AI agents represent a significant leap from traditional chatbots, offering greater autonomy, complexity, and interactivity. However, the term “AI agent” remains fluid, with no universal industry standard. Instead, it exists on a continuum, with varying degrees of autonomy, adaptability, and proactive behavior defining agentic systems. Value and Impact of AI Agents The key benefits of AI agents lie in their ability to automate manual processes, reduce decision-making burdens, and enhance workflows in enterprise environments. By “agentifying” repetitive tasks, AI agents offer substantial productivity gains and the potential to transform how businesses operate. As AI agents evolve, their applications will only expand, driving new efficiencies and enabling organizations to leverage AI in increasingly sophisticated ways. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More

Why Build a General-Purpose Agent?

A general-purpose LLM agent serves as an excellent starting point for prototyping use cases and establishing the foundation for a custom agentic architecture tailored to your needs. What is an LLM Agent? An LLM (Large Language Model) agent is a program where execution logic is governed by the underlying model. Unlike approaches such as few-shot prompting or fixed workflows, LLM agents adapt dynamically. They can determine which tools to use (e.g., web search or code execution), how to use them, and iterate based on results. This adaptability enables handling diverse tasks with minimal configuration. Agentic Architectures Explained:Agentic systems range from the reliability of fixed workflows to the flexibility of autonomous agents. For instance: Your architecture choice will depend on the desired balance between reliability and flexibility for your use case. Building a General-Purpose LLM Agent Step 1: Select the Right LLM Choosing the right model is critical for performance. Evaluate based on: Model Recommendations (as of now): For simpler use cases, smaller models running locally can also be effective, but with limited functionality. Step 2: Define the Agent’s Control Logic The system prompt differentiates an LLM agent from a standalone model. This prompt contains rules, instructions, and structures that guide the agent’s behavior. Common Agentic Patterns: Starting with ReAct or Plan-then-Execute patterns is recommended for general-purpose agents. Step 3: Define the Agent’s Core Instructions To optimize the agent’s behavior, clearly define its features and constraints in the system prompt: Example Instructions: Step 4: Define and Optimize Core Tools Tools expand an agent’s capabilities. Common tools include: For each tool, define: Example: Implementing an Arxiv API tool for scientific queries. Step 5: Memory Handling Strategy Since LLMs have limited memory (context window), a strategy is necessary to manage past interactions. Common approaches include: For personalization, long-term memory can store user preferences or critical information. Step 6: Parse the Agent’s Output To make raw LLM outputs actionable, implement a parser to convert outputs into a structured format like JSON. Structured outputs simplify execution and ensure consistency. Step 7: Orchestrate the Agent’s Workflow Define orchestration logic to handle the agent’s next steps after receiving an output: Example Orchestration Code: pythonCopy codedef orchestrator(llm_agent, llm_output, tools, user_query): while True: action = llm_output.get(“action”) if action == “tool_call”: tool_name = llm_output.get(“tool_name”) tool_params = llm_output.get(“tool_params”, {}) if tool_name in tools: try: tool_result = tools[tool_name](**tool_params) llm_output = llm_agent({“tool_output”: tool_result}) except Exception as e: return f”Error executing tool ‘{tool_name}’: {str(e)}” else: return f”Error: Tool ‘{tool_name}’ not found.” elif action == “return_answer”: return llm_output.get(“answer”, “No answer provided.”) else: return “Error: Unrecognized action type from LLM output.” This orchestration ensures seamless interaction between tools, memory, and user queries. When to Consider Multi-Agent Systems A single-agent setup works well for prototyping but may hit limits with complex workflows or extensive toolsets. Multi-agent architectures can: Starting with a single agent helps refine workflows, identify bottlenecks, and scale effectively. By following these steps, you’ll have a versatile system capable of handling diverse use cases, from competitive analysis to automating workflows. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more We Are All Cloud Users My old company and several others are concerned about security, and feel more secure with being able to walk down Read more

Read More
Fivetrans Hybrid Deployment

Fivetrans Hybrid Deployment

Fivetran’s Hybrid Deployment: A Breakthrough in Data Engineering In the data engineering world, balancing efficiency with security has long been a challenge. Fivetran aims to shift this dynamic with its Hybrid Deployment solution, designed to seamlessly move data across any environment while maintaining control and flexibility. Fivetrans Hybrid Deployment. The Hybrid Advantage: Flexibility Meets Control Fivetran’s Hybrid Deployment offers a new approach for enterprises, particularly those handling sensitive data or operating in regulated sectors. Often, these businesses struggle to adopt data-driven practices due to security concerns. Hybrid Deployment changes this by enabling the secure movement of data across cloud and on-premises environments, giving businesses full control over their data while maintaining the agility of the cloud. As George Fraser, Fivetran’s CEO, notes, “Businesses no longer have to choose between managed automation and data control. They can now securely move data from all their critical sources—like Salesforce, Workday, Oracle, SAP—into a data warehouse or data lake, while keeping that data under their own control.” How it Works: A Secure, Streamlined Approach Fivetran’s Hybrid Deployment relies on a lightweight local agent to move data securely within a customer’s environment, while the Fivetran platform handles the management and monitoring. This separation of control and data planes ensures that sensitive information stays within the customer’s secure perimeter. Vinay Kumar Katta, a managing delivery architect at Capgemini, highlights the flexibility this provides, enabling businesses to design pipelines without sacrificing security. Beyond Security: Additional Benefits Hybrid Deployment’s benefits go beyond just security. It also offers: Early adopters are already seeing its value. Troy Fokken, chief architect at phData, praises how it “streamlines data pipeline processes,” especially for customers in regulated industries. AI Agent Architectures: Defining the Future of Autonomous Systems In the rapidly evolving world of AI, a new framework is emerging—AI agents designed to act autonomously, adapt dynamically, and explore digital environments. These AI agents are built on core architectural principles, bringing the next generation of autonomy to AI-driven tasks. What Are AI Agents? AI agents are systems designed to autonomously or semi-autonomously perform tasks, leveraging tools to achieve objectives. For instance, these agents may use APIs, perform web searches, or interact with digital environments. At their core, AI agents use Large Language Models (LLMs) and Foundation Models (FMs) to break down complex tasks, similar to human reasoning. Large Action Models (LAMs) Just as LLMs transformed natural language processing, Large Action Models (LAMs) are revolutionizing how AI agents interact with environments. These models excel at function calling—turning natural language into structured, executable actions, enabling AI agents to perform real-world tasks like scheduling or triggering API calls. Salesforce AI Research, for instance, has open-sourced several LAMs designed to facilitate meaningful actions. LAMs bridge the gap between unstructured inputs and structured outputs, making AI agents more effective in complex environments. Model Orchestration and Small Language Models (SLMs) Model orchestration complements LAMs by utilizing smaller, specialized models (SLMs) for niche tasks. Instead of relying on resource-heavy models, AI agents can call upon these smaller models for specific functions—such as summarizing data or executing commands—creating a more efficient system. SLMs, combined with techniques like Retrieval-Augmented Generation (RAG), allow smaller models to perform comparably to their larger counterparts, enhancing their ability to handle knowledge-intensive tasks. Vision-Enabled Language Models for Digital Exploration AI agents are becoming even more capable with vision-enabled language models, allowing them to interact with digital environments. Projects like Apple’s Ferret-UI and WebVoyager exemplify this, where agents can navigate user interfaces, recognize elements via OCR, and explore websites autonomously. Function Calling: Structured, Actionable Outputs A fundamental shift is happening with function calling in AI agents, moving from unstructured text to structured, actionable outputs. This allows AI agents to interact with systems more efficiently, triggering specific actions like booking meetings or executing API calls. The Role of Tools and Human-in-the-Loop AI agents rely on tools—algorithms, scripts, or even humans-in-the-loop—to perform tasks and guide actions. This approach is particularly valuable in high-stakes industries like healthcare and finance, where precision is crucial. The Future of AI Agents With the advent of Large Action Models, model orchestration, and function calling, AI agents are becoming powerful problem solvers. These agents are evolving to explore, learn, and act within digital ecosystems, bringing us closer to a future where AI mimics human problem-solving processes. As AI agents become more sophisticated, they will redefine how we approach digital tasks and interactions. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
AI Agents Connect Tool Calling and Reasoning

AI Agents Connect Tool Calling and Reasoning

AI Agents: Bridging Tool Calling and Reasoning in Generative AI Exploring Problem Solving and Tool-Driven Decision Making in AI Introduction: The Emergence of Agentic AI Recent advancements in libraries and low-code platforms have simplified the creation of AI agents, often referred to as digital workers. Tool calling stands out as a key capability that enhances the “agentic” nature of Generative AI models, enabling them to move beyond mere conversational tasks. By executing tools (functions), these agents can act on your behalf and tackle intricate, multi-step problems requiring sound decision-making and interaction with diverse external data sources. This insight explores the role of reasoning in tool calling, examines the challenges associated with tool usage, discusses common evaluation methods for tool-calling proficiency, and provides examples of how various models and agents engage with tools. Reasoning as a Means of Problem-Solving Successful agents rely on two fundamental expressions of reasoning: reasoning through evaluation and planning, and reasoning through tool use. While both reasoning expressions are vital, they don’t always need to be combined to yield powerful solutions. For instance, OpenAI’s new o1 model excels in reasoning through evaluation and planning, having been trained to utilize chain of thought effectively. This has notably enhanced its ability to address complex challenges, achieving human PhD-level accuracy on benchmarks like GPQA across physics, biology, and chemistry, and ranking in the 86th-93rd percentile on Codeforces contests. However, the o1 model currently lacks explicit tool calling capabilities. Conversely, many models are specifically fine-tuned for reasoning through tool use, allowing them to generate function calls and interact with APIs effectively. These models focus on executing the right tool at the right moment but may not evaluate their results as thoroughly as the o1 model. The Berkeley Function Calling Leaderboard (BFCL) serves as an excellent resource for comparing the performance of various models on tool-calling tasks and provides an evaluation suite for assessing fine-tuned models against challenging scenarios. The recently released BFCL v3 now includes multi-step, multi-turn function calling, raising the standards for tool-based reasoning tasks. Both reasoning types are powerful in their own right, and their combination holds the potential to develop agents that can effectively deconstruct complex tasks and autonomously interact with their environments. For more insights into AI agent architectures for reasoning, planning, and tool calling, check out my team’s survey paper on ArXiv. Challenges in Tool Calling: Navigating Complex Agent Behaviors Creating robust and reliable agents necessitates overcoming various challenges. In tackling complex problems, an agent often must juggle multiple tasks simultaneously, including planning, timely tool interactions, accurate formatting of tool calls, retaining outputs from prior steps, avoiding repetitive loops, and adhering to guidelines to safeguard the system against jailbreaks and prompt injections. Such demands can easily overwhelm a single agent, leading to a trend where what appears to an end user as a single agent is actually a coordinated effort of multiple agents and prompts working in unison to divide and conquer the task. This division enables tasks to be segmented and addressed concurrently by distinct models and agents, each tailored to tackle specific components of the problem. This is where models with exceptional tool-calling capabilities come into play. While tool calling is a potent method for empowering productive agents, it introduces its own set of challenges. Agents must grasp the available tools, choose the appropriate one from a potentially similar set, accurately format the inputs, execute calls in the correct sequence, and potentially integrate feedback or instructions from other agents or humans. Many models are fine-tuned specifically for tool calling, allowing them to specialize in selecting functions accurately at the right time. Key considerations when fine-tuning a model for tool calling include: Common Benchmarks for Evaluating Tool Calling As tool usage in language models becomes increasingly significant, numerous datasets have emerged to facilitate the evaluation and enhancement of model tool-calling capabilities. Two prominent benchmarks include the Berkeley Function Calling Leaderboard and the Nexus Function Calling Benchmark, both utilized by Meta to assess the performance of their Llama 3.1 model series. The recent ToolACE paper illustrates how agents can generate a diverse dataset for fine-tuning and evaluating model tool use. Here’s a closer look at each benchmark: Each of these benchmarks enhances our ability to evaluate model reasoning through tool calling. They reflect a growing trend toward developing specialized models for specific tasks and extending the capabilities of LLMs to interact with the real world. Practical Applications of Tool Calling If you’re interested in observing tool calling in action, here are some examples to consider, categorized by ease of use, from simple built-in tools to utilizing fine-tuned models and agents with tool-calling capabilities. While the built-in web search feature is convenient, most applications require defining custom tools that can be integrated into your model workflows. This leads us to the next complexity level. To observe how models articulate tool calls, you can use the Databricks Playground. For example, select the Llama 3.1 405B model and grant access to sample tools like get_distance_between_locations and get_current_weather. When prompted with, “I am going on a trip from LA to New York. How far are these two cities? And what’s the weather like in New York? I want to be prepared for when I get there,” the model will decide which tools to call and what parameters to provide for an effective response. In this scenario, the model suggests two tool calls. Since the model cannot execute the tools, the user must input a sample result to simulate. Suppose you employ a model fine-tuned on the Berkeley Function Calling Leaderboard dataset. When prompted, “How many times has the word ‘freedom’ appeared in the entire works of Shakespeare?” the model will successfully retrieve and return the answer, executing the required tool calls without the user needing to define any input or manage the output format. Such models handle multi-turn interactions adeptly, processing past user messages, managing context, and generating coherent, task-specific outputs. As AI agents evolve to encompass advanced reasoning and problem-solving capabilities, they will become increasingly adept at managing

Read More
Open AI Update

Open AI Update

OpenAI has established itself as a leading force in the generative AI space, with its ChatGPT being one of the most widely recognized AI tools. Powered by the GPT series of large language models (LLMs), as of September 2024, ChatGPT primarily uses GPT-4o and GPT-3.5. This insight provides an Open AI Update. In August and September 2024, rumors circulated about a new model from OpenAI, codenamed “Strawberry.” Initially, it was unclear if this model would be a successor to GPT-4o or something entirely different. On September 12, 2024, the mystery was resolved with the official launch of OpenAI’s o1 models, including o1-preview and o1-mini. What is OpenAI o1? OpenAI o1 is a new family of LLMs optimized for advanced reasoning tasks. Unlike earlier models, o1 is designed to improve problem-solving by reasoning through queries rather than just generating quick responses. This deeper processing aims to produce more accurate answers to complex questions, particularly in fields like STEM (science, technology, engineering, and mathematics). The o1 models, currently available in preview form, are intended to provide a new type of LLM experience beyond what GPT-4o offers. Like all OpenAI LLMs, the o1 series is built on transformer architecture and can be used for tasks such as content summarization, new content generation, question answering, and writing code. Key Features of OpenAI o1 The standout feature of the o1 models is their ability to engage in multistep reasoning. By adopting a “chain-of-thought” approach, o1 models break down complex problems and reason through them iteratively. This makes them particularly adept at handling intricate queries that require a more thoughtful response. The initial September 2024 launch included two models: Use Cases for OpenAI o1 The o1 models can perform many of the same functions as GPT-4o, such as answering questions, summarizing content, and generating text. However, they are particularly suited for tasks that benefit from enhanced reasoning, including: Availability and Access The o1-preview and o1-mini models are available to users of ChatGPT Plus and Team as of September 12, 2024. OpenAI plans to extend access to ChatGPT Enterprise and Education users starting September 19, 2024. While free ChatGPT users do not have access to these models at launch, OpenAI intends to introduce o1-mini to free users in the future. Developers can also access the models through OpenAI’s API, and third-party platforms such as Microsoft Azure AI Studio and GitHub Models offer integration. Limitations of OpenAI o1 As preview models, o1 comes with certain limitations: Enhancing Safety with OpenAI o1 To ensure safety, OpenAI released a System Card that outlines how the o1 models were evaluated for risks like cybersecurity threats, persuasion, and model autonomy. The o1 models improve safety through: GPT-4o vs. OpenAI o1 Here’s a quick comparison between GPT-4o and OpenAI’s new o1 models: Feature GPT-4o o1 Models Release Date May 13, 2024 Sept. 12, 2024 Model Variants Single model Two variants: o1-preview and o1-mini Reasoning Capabilities Good Enhanced, especially for STEM fields Mathematics Olympiad Score 13% 83% Context Window 128K tokens 128K tokens Speed Faster Slower due to in-depth reasoning Cost (per million tokens) Input: $5; Output: $15 o1-preview: $15 input, $60 output; o1-mini: $3 input, $12 output Safety and Alignment Standard Enhanced safety, better jailbreak resistance OpenAI’s o1 models bring a new level of reasoning and accuracy, making them a promising advancement in generative AI. Like1 Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more We Are All Cloud Users My old company and several others are concerned about security, and feel more secure with being able to walk down Read more

Read More
chatGPT open ai 01

ChatGPT Open AI o1

OpenAI has firmly established itself as a leader in the generative AI space, with its ChatGPT being one of the most well-known applications of AI today. Powered by the GPT family of large language models (LLMs), ChatGPT’s primary models, as of September 2024, are GPT-4o and GPT-3.5. In August and September 2024, rumors surfaced about a new model from OpenAI, codenamed “Strawberry.” Speculation grew as to whether this was a successor to GPT-4o or something else entirely. The mystery was resolved on September 12, 2024, when OpenAI launched its new o1 models, including o1-preview and o1-mini. What Is OpenAI o1? The OpenAI o1 family is a series of large language models optimized for enhanced reasoning capabilities. Unlike GPT-4o, the o1 models are designed to offer a different type of user experience, focusing more on multistep reasoning and complex problem-solving. As with all OpenAI models, o1 is a transformer-based architecture that excels in tasks such as content summarization, content generation, coding, and answering questions. What sets o1 apart is its improved reasoning ability. Instead of prioritizing speed, the o1 models spend more time “thinking” about the best approach to solve a problem, making them better suited for complex queries. The o1 models use chain-of-thought prompting, reasoning step by step through a problem, and employ reinforcement learning techniques to enhance performance. Initial Launch On September 12, 2024, OpenAI introduced two versions of the o1 models: Key Capabilities of OpenAI o1 OpenAI o1 can handle a variety of tasks, but it is particularly well-suited for certain use cases due to its advanced reasoning functionality: How to Use OpenAI o1 There are several ways to access the o1 models: Limitations of OpenAI o1 As an early iteration, the o1 models have several limitations: How OpenAI o1 Enhances Safety OpenAI released a System Card alongside the o1 models, detailing the safety and risk assessments conducted during their development. This includes evaluations in areas like cybersecurity, persuasion, and model autonomy. The o1 models incorporate several key safety features: GPT-4o vs. OpenAI o1: A Comparison Here’s a side-by-side comparison of GPT-4o and OpenAI o1: Feature GPT-4o o1 Models Release Date May 13, 2024 Sept. 12, 2024 Model Variants Single Model Two: o1-preview and o1-mini Reasoning Capabilities Good Enhanced, especially in STEM fields Performance Benchmarks 13% on Math Olympiad 83% on Math Olympiad, PhD-level accuracy in STEM Multimodal Capabilities Text, images, audio, video Primarily text, with developing image capabilities Context Window 128K tokens 128K tokens Speed Fast Slower due to more reasoning processes Cost (per million tokens) Input: $5; Output: $15 o1-preview: $15 input, $60 output; o1-mini: $3 input, $12 output Availability Widely available Limited to specific users Features Includes web browsing, file uploads Lacks some features from GPT-4o, like web browsing Safety and Alignment Focus on safety Improved safety, better resistance to jailbreaking ChatGPT Open AI o1 OpenAI o1 marks a significant advancement in reasoning capabilities, setting a new standard for complex problem-solving with LLMs. With enhanced safety features and the ability to tackle intricate tasks, o1 models offer a distinct upgrade over their predecessors. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more We Are All Cloud Users My old company and several others are concerned about security, and feel more secure with being able to walk down Read more

Read More
Large Action Models and AI Agents

Large Action Models and AI Agents

The introduction of LAMs marks a significant advancement in AI, focusing on actionable intelligence. By enabling robust, dynamic interactions through function calling and structured output generation, LAMs are set to redefine the capabilities of AI agents across industries.

Read More
GPT-o1 GPT5 Review

GPT-o1 GPT5 Review

OpenAI has released its latest model, GPT-5, also known as Project Strawberry or GPT-o1, positioning it as a significant advancement in AI with PhD-level reasoning capabilities. This new series, OpenAI-o1, is designed to enhance problem-solving in fields such as science, coding, and mathematics, and the initial results indicate that it lives up to the anticipation. Key Features of OpenAI-o1 Enhanced Reasoning Capabilities Safety and Alignment Targeted Applications Model Variants Access and Availability The o1 models are available to ChatGPT Plus and Team users, with broader access expected soon for ChatGPT Enterprise users. Developers can access the models through the API, although certain features like function calling are still in development. Free access to o1-mini is expected to be provided in the near future. Reinforcement Learning at the Core The o1 models utilize reinforcement learning to improve their reasoning abilities. This approach focuses on training the models to think more effectively, improving their performance with additional time spent on tasks. OpenAI continues to explore how to scale this approach, though details remain limited. Major Milestones The o1 model has achieved impressive results in several competitive benchmarks: Chain of Thought Reasoning OpenAI’s o1 models employ the “Chain of Thought” prompt engineering technique, which allows the model to think through problems step by step. This method helps the model approach complex problems in a structured way, similar to human reasoning. Key aspects include: While the o1 models show immense promise, there are still some limitations, which have been covered in detail elsewhere. However, based on early tests, the model is performing impressively, and users are hopeful that these capabilities are as robust as advertised, rather than overhyped like previous projects such as SORA or SearchGPT by OpenAI. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more We Are All Cloud Users My old company and several others are concerned about security, and feel more secure with being able to walk down Read more

Read More
Salesforce Tiny Giant LLM

Salesforce Tiny Giant LLM

‘On-device Agentic AI is Here!’: Salesforce Announces the ‘Tiny Giant’ LLM Salesforce CEO Marc Benioff is excited about the company’s latest innovation in AI, introducing the ‘Tiny Giant’ LLM, which he claims is the world’s top-performing “micro-model” for function-calling. Salesforce’s new slimline “Tiny Giant” LLM reportedly outperforms larger models, marking a significant advancement in on-device AI. According to a paper published on Arxiv by Salesforce’s AI Research department, the xLAM-7B LLM model ranked sixth among 46 models, including those from OpenAI and Google, in a competition testing function-calling (execution of tasks or functions through API calls). The xLAM-7B LLM was trained on just seven billion parameters, a small fraction compared to the 1.7 trillion parameters rumored to be used by GPT-4. However, Salesforce highlights the xLAM-1B, a smaller model, as its true star. Despite being trained on just one billion parameters, the xLAM-1B model finished 24th, surpassing GPT-3.5-Turbo and Claude-3 Haiku in performance. CEO Marc Benioff proudly shared these results on X (formerly Twitter), stating: “Meet Salesforce Einstein ‘Tiny Giant.’ Our 1B parameter model xLAM-1B is now the best micro-model for function-calling, outperforming models 7x its size… On-device agentic AI is here. Congrats Salesforce Research!” Salesforce’s research emphasizes that function-calling agents represent a significant advancement in AI and LLMs. Models like GPT-4, Gemini, and Mistral already execute API calls based on natural language prompts, enabling dynamic interactions with various digital services and applications. While many popular models are large and resource-intensive, requiring cloud data centers and extensive infrastructure, Salesforce’s new models demonstrate that smaller, more efficient models can achieve state-of-the-art performance. To test function-calling LLMs, Salesforce developed APIGen, an “Automated Pipeline for Generating verifiable and diverse function-calling datasets,” to synthesize data for AI training. Salesforce’s findings indicate that models trained on relatively small datasets can outperform those trained on larger datasets. “Models trained with our curated datasets, even with only seven billion parameters, can achieve state-of-the-art performance… outperforming multiple GPT-4 models,” the paper states. The ultimate goal is to create agentic AI models capable of function-calling and task execution on devices, minimizing the need for extensive external infrastructure and enabling self-sufficient operations. Dr. Eli David, Co-Founder of the cybersecurity firm Deep Instinct, commented on X, “Smaller, more efficient models are the way to go for widespread deployment of LLMs.” Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
gettectonic.com