Generative AI Benchmark from Salesforce

Salesforce Introduces Generative AI Benchmark Tool for CRM

Generative AI Benchmark from Salesforce evaluation tool designed to help businesses select the most suitable large language models (LLMs) for their CRM needs.

Thank you for reading this post, don't forget to subscribe!

Key Benefits of the Generative AI Benchmark for CRM:

Informed Decision-Making: Clara Shih, CEO of Salesforce AI, emphasizes that businesses need LLMs that are not only effective but also compliant, secure, and cost-efficient.
Optimization of Cost and Performance: Shih notes that businesses face a constrained optimization problem, balancing cost, accuracy, trust and safety, and speed to route the right tasks to the right models.
Business-Relevant Evaluation: Unlike academic or theoretical benchmarks, Salesforce’s tool focuses on the practical application of LLMs in business contexts.

Tailored for CRM Applications: Silvio Savarese, EVP and Chief Scientist of Salesforce Research, highlights the importance of aligning generative AI processes with CRM goals. The benchmark helps businesses assess various LLMs using real-world CRM data, covering use cases such as sales and service scenarios.

Human-Centered Evaluation Approach: The benchmark, developed by Salesforce’s Frontier AI applied research group and core product teams, leverages human professionals and real CRM data. This approach ensures a thorough evaluation across four key areas:

Accuracy: Assessed through factuality, completeness, conciseness, and instruction-following. Accurate predictions enhance customer experience and organizational value, with potential improvements via prompt engineering and fine-tuning.
Cost: Categorized as high, medium, or low, allowing businesses to evaluate cost-effectiveness relative to their budget and resource strategies.
Speed: Measures responsiveness and efficiency in processing and delivering information, crucial for improving user experience and operational efficiency.
Trust and Safety: Evaluates how models handle sensitive data, adhere to privacy regulations, and secure information.

Notable Insights: Savarese points out that larger models are not always the optimal choice. Smaller, more cost-effective models can offer satisfactory performance. He also mentions that this benchmark is just the beginning, with plans to expand metrics, use cases, and data annotations. Future evaluations will include the performance of fine-tuned models on CRM data, promising further differentiation and improvement.

Salesforce’s generative AI benchmark tool offers a comprehensive and practical framework for businesses to choose the best LLMs for their CRM needs, ensuring a balance of accuracy, cost, speed, and trust and safety.