LLM Performance Archives - gettectonic.com

14Jan

Statement Accuracy Prediction based on Language Model Activations

When users first began interacting with ChatGPT, they noticed an intriguing behavior: the model would often reverse its stance when told it was wrong. This raised concerns about the reliability of its outputs. How can users trust a system that appears to contradict itself? Recent research has revealed that large language models (LLMs) not only generate inaccurate information (often referred to as “hallucinations”) but are also aware of their inaccuracies. Despite this awareness, these models proceed to present their responses confidently. Unveiling LLM Awareness of Hallucinations Researchers discovered this phenomenon by analyzing the internal mechanisms of LLMs. Whenever an LLM generates a response, it transforms the input query into a numerical representation and performs a series of computations before producing the output. At intermediate stages, these numerical representations are called “activations.” These activations contain significantly more information than what is reflected in the final output. By scrutinizing these activations, researchers can identify whether the LLM “knows” its response is inaccurate. A technique called SAPLMA (Statement Accuracy Prediction based on Language Model Activations) has been developed to explore this capability. SAPLMA examines the internal activations of LLMs to predict whether their outputs are truthful or not. Why Do Hallucinations Occur? LLMs function as next-word prediction models. Each word is selected based on its likelihood given the preceding words. For example, starting with “I ate,” the model might predict the next words as follows: The issue arises when earlier predictions constrain subsequent outputs. Once the model commits to a word, it cannot go back to revise its earlier choice. For instance: In another case: This mechanism reveals how the constraints of next-word prediction can lead to hallucinations, even when the model “knows” it is generating an incorrect response. Detecting Inaccuracies with SAPLMA To investigate whether an LLM recognizes its own inaccuracies, researchers developed the SAPLMA method. Here’s how it works: The classifier itself is a simple neural network with three dense layers, culminating in a binary output that predicts the truthfulness of the statement. Results and Insights The SAPLMA method achieved an accuracy of 60–80%, depending on the topic. While this is a promising result, it is not perfect and has notable limitations. For example: However, if LLMs can learn to detect inaccuracies during the generation process, they could potentially refine their outputs in real time, reducing hallucinations and improving reliability. The Future of Error Mitigation in LLMs The SAPLMA method represents a step forward in understanding and mitigating LLM errors. Accurate classification of inaccuracies could pave the way for models that can self-correct and produce more reliable outputs. While the current limitations are significant, ongoing research into these methods could lead to substantial improvements in LLM performance. By combining techniques like SAPLMA with advancements in LLM architecture, researchers aim to build models that are not only aware of their errors but capable of addressing them dynamically, enhancing both the accuracy and trustworthiness of AI systems. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

January 14, 2025in Data, Salesforce

15Nov

LLM Economies

Throughout history, disruptive technologies have been the catalyst for major social and economic revolutions. The invention of the plow and irrigation systems 12,000 years ago sparked the Agricultural Revolution, while Johannes Gutenberg’s 15th-century printing press fueled the Protestant Reformation and helped propel Europe out of the Middle Ages into the Renaissance. In the 18th century, James Watt’s steam engine ushered in the Industrial Revolution. More recently, the internet has revolutionized communication, commerce, and information access, shrinking the world into a global village. Similarly, smartphones have transformed how people interact with their surroundings. Now, we stand at the dawn of the AI revolution. Large Language Models (LLMs) represent a monumental leap forward, with significant economic implications at both macro and micro levels. These models are reshaping global markets, driving new forms of currency, and creating a novel economic landscape. The reason LLMs are transforming industries and redefining economies is simple: they automate both routine and complex tasks that traditionally require human intelligence. They enhance decision-making processes, boost productivity, and facilitate cost reductions across various sectors. This enables organizations to allocate human resources toward more creative and strategic endeavors, resulting in the development of new products and services. From healthcare to finance to customer service, LLMs are creating new markets and driving AI-driven services like content generation and conversational assistants into the mainstream. To truly grasp the engine driving this new global economy, it’s essential to understand the inner workings of this disruptive technology. These posts will provide both a macro-level overview of the economic forces at play and a deep dive into the technical mechanics of LLMs, equipping you with a comprehensive understanding of the revolution happening now. Why Now? The Connection Between Language and Human Intelligence AI did not begin with ChatGPT’s arrival in November 2022. Many people were developing machine learning classification models in 1999, and the roots of AI go back even further. Artificial Intelligence was formally born in 1950, when Alan Turing—considered the father of theoretical computer science and famed for cracking the Nazi Enigma code during World War II—created the first formal definition of intelligence. This definition, known as the Turing Test, demonstrated the potential for machines to exhibit human-like intelligence through natural language conversations. The test involves a human evaluator who engages in conversations with both a human and a machine. If the evaluator cannot reliably distinguish between the two, the machine is considered to have passed the test. Remarkably, after 72 years of gradual AI development, ChatGPT simulated this very interaction, passing the Turing Test and igniting the current AI explosion. But why is language so closely tied to human intelligence, rather than, for example, vision? While 70% of our brain’s neurons are devoted to vision, OpenAI’s pioneering image generation model, DALL-E, did not trigger the same level of excitement as ChatGPT. The answer lies in the profound role language has played in human evolution. The Evolution of Language The development of language was the turning point in humanity’s rise to dominance on Earth. As Yuval Noah Harari points out in his book Sapiens: A Brief History of Humankind, it was the ability to gossip and discuss abstract concepts that set humans apart from other species. Complex communication, such as gossip, requires a shared, sophisticated language. Human language evolved from primitive cave signs to structured alphabets, which, along with grammar rules, created languages capable of expressing thousands of words. In today’s digital age, language has further evolved with the inclusion of emojis, and now with the advent of GenAI, tokens have become the latest cornerstone in this progression. These shifts highlight the extraordinary journey of human language, from simple symbols to intricate digital representations. In the next post, we will explore the intricacies of LLMs, focusing specifically on tokens. But before that, let’s delve into the economic forces shaping the LLM-driven world. The Forces Shaping the LLM Economy AI Giants in Competition Karl Marx and Friedrich Engels argued that those who control the means of production hold power. The tech giants of today understand that AI is the future means of production, and the race to dominate the LLM market is well underway. This competition is fierce, with industry leaders like OpenAI, Google, Microsoft, and Facebook battling for supremacy. New challengers such as Mistral (France), AI21 (Israel), and Elon Musk’s xAI and Anthropic are also entering the fray. The LLM industry is expanding exponentially, with billions of dollars of investment pouring in. For example, Anthropic has raised $4.5 billion from 43 investors, including major players like Amazon, Google, and Microsoft. The Scarcity of GPUs Just as Bitcoin mining requires vast computational resources, training LLMs demands immense computing power, driving a search for new energy sources. Microsoft’s recent investment in nuclear energy underscores this urgency. At the heart of LLM technology are Graphics Processing Units (GPUs), essential for powering deep neural networks. These GPUs have become scarce and expensive, adding to the competitive tension. Tokens: The New Currency of the LLM Economy Tokens are the currency driving the emerging AI economy. Just as money facilitates transactions in traditional markets, tokens are the foundation of LLM economics. But what exactly are tokens? Tokens are the basic units of text that LLMs process. They can be single characters, parts of words, or entire words. For example, the word “Oscar” might be split into two tokens, “os” and “car.” The performance of LLMs—quality, speed, and cost—hinges on how efficiently they generate these tokens. LLM providers price their services based on token usage, with different rates for input (prompt) and output (completion) tokens. As companies rely more on LLMs, especially for complex tasks like agentic applications, token usage will significantly impact operational costs. With fierce competition and the rise of open-source models like Llama-3.1, the cost of tokens is rapidly decreasing. For instance, OpenAI reduced its GPT-4 pricing by about 80% over the past year and a half. This trend enables companies to expand their portfolio of AI-powered products, further fueling the LLM economy. Context Windows: Expanding Capabilities

November 15, 2024in Artificial Intelligence, Google, Technology

22Jul

Einstein Code Generation and Amazon SageMaker

Salesforce and the Evolution of AI-Driven CRM Solutions Salesforce, Inc., headquartered in San Francisco, California, is a leading American cloud-based software company specializing in customer relationship management (CRM) software and applications. Their offerings include sales, customer service, marketing automation, e-commerce, analytics, and application development. Salesforce is at the forefront of integrating artificial general intelligence (AGI) into its services, enhancing its flagship SaaS CRM platform with predictive and generative AI capabilities and advanced automation features. Einstein Code Generation and Amazon SageMaker. Salesforce Einstein: Pioneering AI in Business Applications Salesforce Einstein represents a suite of AI technologies embedded within Salesforce’s Customer Success Platform, designed to enhance productivity and client engagement. With over 60 features available across different pricing tiers, Einstein’s capabilities are categorized into machine learning (ML), natural language processing (NLP), computer vision, and automatic speech recognition. These tools empower businesses to deliver personalized and predictive customer experiences across various functions, such as sales and customer service. Key components include out-of-the-box AI features like sales email generation in Sales Cloud and service replies in Service Cloud, along with tools like Copilot, Prompt, and Model Builder within Einstein 1 Studio for custom AI development. The Salesforce Einstein AI Platform Team: Enhancing AI Capabilities The Salesforce Einstein AI Platform team is responsible for the ongoing development and enhancement of Einstein’s AI applications. They focus on advancing large language models (LLMs) to support a wide range of business applications, aiming to provide cutting-edge NLP capabilities. By partnering with leading technology providers and leveraging open-source communities and cloud services like AWS, the team ensures Salesforce customers have access to the latest AI technologies. Optimizing LLM Performance with Amazon SageMaker In early 2023, the Einstein team sought a solution to host CodeGen, Salesforce’s in-house open-source LLM for code understanding and generation. CodeGen enables translation from natural language to programming languages like Python and is particularly tuned for the Apex programming language, integral to Salesforce’s CRM functionality. The team required a hosting solution that could handle a high volume of inference requests and multiple concurrent sessions while meeting strict throughput and latency requirements for their EinsteinGPT for Developers tool, which aids in code generation and review. After evaluating various hosting solutions, the team selected Amazon SageMaker for its robust GPU access, scalability, flexibility, and performance optimization features. SageMaker’s specialized deep learning containers (DLCs), including the Large Model Inference (LMI) containers, provided a comprehensive solution for efficient LLM hosting and deployment. Key features included advanced batching strategies, efficient request routing, and access to high-end GPUs, which significantly enhanced the model’s performance. Key Achievements and Learnings Einstein Code Generation and Amazon SageMaker The integration of SageMaker resulted in a dramatic improvement in the performance of the CodeGen model, boosting throughput by over 6,500% and reducing latency significantly. The use of SageMaker’s tools and resources enabled the team to optimize their models, streamline deployment, and effectively manage resource use, setting a benchmark for future projects. Conclusion and Future Directions Salesforce’s experience with SageMaker highlights the critical importance of leveraging advanced tools and strategies in AI model optimization. The successful collaboration underscores the need for continuous innovation and adaptation in AI technologies, ensuring that Salesforce remains at the cutting edge of CRM solutions. For those interested in deploying their LLMs on SageMaker, Salesforce’s experience serves as a valuable case study, demonstrating the platform’s capabilities in enhancing AI performance and scalability. To begin hosting your own LLMs on SageMaker, consider exploring their detailed guides and resources. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

July 22, 2024in Generative AI, Salesforce Einstein

21Jul

Advanced RAG

Significant Changes in Context Windows and Token Costs The first significant change in the AI world is the substantial increase in the context window size and the decrease in token costs. For example, the context window size of the largest model, Claude from Anthropic, exceeds 200,000 tokens. According to the latest news, Gemini’s context window can reach up to 10 million tokens. Under these conditions, Retrieval-Augmented Generation (RAG) may not be required for many tasks, as all necessary data can fit into the context window. This shift has been observed in several financial and analytical projects where tasks were completely solved without using a vector database as intermediate storage. The trend of reducing token costs and increasing context window sizes is likely to continue, diminishing the need for external mechanisms for large language models (LLMs). However, they remain necessary for the time being. Advanced RAG. If the context size is still insufficient, different methods of summarization and context compression have been devised. LangChain has introduced a class aimed at this: ConversationSummaryMemory. pythonCopy codellm = OpenAI(temperature=0) conversation_with_summary = ConversationChain( llm=llm, memory=ConversationSummaryMemory(llm=OpenAI()), verbose=True ) conversation_with_summary.predict(input=”Hi, what’s up?”) Knowledge Graphs As the amount of data LLMs must navigate grows, the ability to navigate this data becomes increasingly important. Without the ability to analyze the data structure and other attributes, it’s impossible to use them effectively. For example, suppose the data source is a company’s wiki with a page containing the company’s phone number, but this isn’t explicitly indicated anywhere. How does the LLM understand that this is the company’s phone number? It doesn’t, which is why standard RAG won’t provide any information about the company’s phone number (as it sees no connection). A person can understand that this is the company’s phone number from the convention of how the data is stored (i.e., from the structure or metadata). For LLMs, this problem is solved with Knowledge Graphs with metadata (also known as Knowledge Maps), which means the LLM has not only the raw data but also information about the storage structure and the connections between different data entities. This approach is also known as Graph Retrieval-Augmented Generation (GraphRAG). Graphs are excellent for representing and storing heterogeneous and interconnected information in a structured form, easily capturing complex relationships and attributes among different types of data, which vector databases struggle with. Example of a Knowledge Graph Creating a Knowledge Graph typically involves collecting and structuring data, requiring a deep understanding of both the subject area and graph modeling. This process can largely be automated with LLMs. Thanks to their understanding of language and context, LLMs can automate significant parts of the Knowledge Graph creation process. By analyzing textual data, these models can identify entities, understand their relationships, and suggest how best to represent them in a graph structure. Advanced RAG This ensemble of a vector database and a knowledge graph generally improves accuracy and often includes a search through a regular database or by keywords (e.g., Elasticsearch). Knowledge Graph Retriever Example For example, a user asks a question about the company’s phone number. If this is done in code, the entities from the question can be formatted in JSON or using with_structured_output from LangChain. These entities are then searched for in the Knowledge Graph. How this is done depends on where the graph is stored. pythonCopy codedocuments = parse_and_load_data_from_wiki_including_metadata() graph_store = NebulaGraphStore( space_name=”Company Wiki”, tags=[“entity”] ) storage_context = StorageContext.from_defaults(graph_store=graph_store) index = KnowledgeGraphIndex.from_documents( documents, max_triplets_per_chunk=2, space_name=”Company Wiki”, tags=[“entity”] ) query_engine = index.as_query_engine() response = query_engine.query(“Tell me more about our Company”) The search differs from a vector database search in that it searches for attributes and related entities, not similar vectors. Returning to the initial question, if the wiki structure was transferred to the graph correctly, the company’s phone number would be added as a related entity in the graph. The data from the graph and the vector database search are then passed to the LLM to generate a complete answer. Challenges and Solutions Access Control Access to data may not be uniform. In the same wiki, there may be roles and permissions, and not every user can see all information. This problem exists for both graph and vector database searches, requiring access management. Role-Based Access Control (RBAC), Attribute-Based Access Control (ABAC), and Relationship-Based Access Control (ReBAC) are common methods. Permissions and categories are also forms of metadata, which must be preserved at the data ingestion stage in the knowledge graph and vector database. When searching in the vector database, it is necessary to check whether the role or other access attributes match what the user has access to. Some commercial vector databases already include this functionality. Data embedded in the LLM during training relies on the LLM’s reasonableness, which is not recommended. Ingestion and Parsing Data needs to be inserted into the graph and the vector database. For the graph, the format is critical as it reflects the data structure and serves as metadata. Parsing data, especially from PDFs, can be challenging. Frameworks like LLama Parse attempt this with varying degrees of success. However, OCR or recognizing a document image can sometimes be easier. Improving Answer Advanced RAG Several approaches aim to improve answer quality beyond using knowledge graphs: Corrective Retrieval Augmented Generation (CRAG) CRAG addresses incorrect RAG results by automating correction processes. LangGraph can implement this approach, which essentially forms a state machine. Self-RAG Self-reflective RAG fine-tunes the LLM to generate self-reflection tokens in addition to regular ones, helping build a state machine for better results. HyDe HyDe (Hypothetical Document Embeddings) modifies the usual RAG retrieval process by using the LLM to generate a response and then searching the vector database with that response. This is useful when users’ questions are too abstract and require more context. These methods, including CRAG, Self-RAG, and HyDe, provide various ways to enhance the performance of LLMs and improve the quality of their answers. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM

July 21, 2024in Salesforce

11May

Gen AI Role in Healthcare

Generative AI’s Growing Role in Healthcare: Potential and Challenges The rapid advancements in large language models (LLMs) have introduced generative AI tools into nearly every business sector, including healthcare. As defined by the Government Accountability Office, generative AI is “a technology that can create content, including text, images, audio, or video, when prompted by a user.” These systems learn patterns and relationships from vast datasets, enabling them to generate new content that resembles but is not identical to the original training data. This capability is powered by machine learning algorithms and statistical models. In healthcare, generative AI is being utilized for various applications, including clinical documentation, patient communication, and clinical text summarization. Streamlining Clinical Documentation Excessive documentation is a leading cause of clinician burnout, as highlighted by a 2022 athenahealth survey conducted by the Harris Poll. Generative AI shows promise in easing these documentation burdens, potentially improving clinician satisfaction and reducing burnout. A 2024 study published in NEJM Catalyst explored the use of ambient AI scribes within The Permanente Medical Group (TPMG). This technology employs smartphone microphones and generative AI to transcribe patient encounters in real-time, providing clinicians with draft documentation for review. In October 2023, TPMG deployed this ambient AI technology across various settings, benefiting 10,000 physicians and staff. Physicians who used the ambient AI scribe reported positive outcomes, including more personal and meaningful patient interactions and reduced after-hours electronic health record (EHR) documentation. Early patient feedback was also favorable, with improved provider interactions noted. Additionally, ambient AI produced high-quality clinical documentation for clinician review. However, a 2023 study in the Journal of the American Medical Informatics Association (JAMIA) cautioned that ambient AI might struggle with non-lexical conversational sounds (NLCSes), such as “mm-hm” or “uh-uh,” which can convey clinically relevant information. The study found that while the ambient AI tools had a word error rate of about 12% for all words, the error rate for NLCSes was significantly higher, reaching up to 98.7% for those conveying critical information. Misinterpretation of these sounds could lead to inaccuracies in clinical documentation and potential patient safety issues. Enhancing Patient Communication With the digital transformation in healthcare, patient portal messages have surged. A 2021 study in JAMIA reported a 157% increase in patient portal inbox messages since 2020. In response, some healthcare organizations are exploring the use of generative AI to draft replies to these messages. A 2024 study published in JAMA Network Open evaluated the adoption of AI-generated draft replies to patient messages at an academic medical center. After five weeks, clinicians used the AI-generated drafts 20% of the time, a notable rate considering the LLMs were not fine-tuned for patient communication. Clinicians reported reduced task load and emotional exhaustion, suggesting that AI-generated replies could help alleviate burnout. However, the study found no significant changes in reply time, read time, or write time between the pre-pilot and pilot periods. Despite this, clinicians expressed optimism about time savings, indicating that the cognitive ease of editing drafts rather than writing from scratch might not be fully captured by time metrics. Summarizing Clinical Data Summarizing information within patient records is a time-consuming task for clinicians, and errors in this process can negatively impact clinical decision support. Generative AI has shown potential in this area, with a 2023 study finding that LLM-generated summaries could outperform human expert summaries in terms of conciseness, completeness, and correctness. However, using generative AI for clinical data summarization presents risks. A viewpoint in JAMA argued that LLMs performing summarization tasks might not fall under FDA medical device oversight, as they provide language-based outputs rather than disease predictions or numerical estimates. Without statutory changes, the FDA’s authority to regulate these LLMs remains unclear. The authors also noted that differences in summary length, organization, and tone could influence clinician interpretations and subsequent decision-making. Furthermore, LLMs might exhibit biases, such as sycophancy, where responses are tailored to user expectations. To address these concerns, the authors called for comprehensive standards for LLM-generated summaries, including testing for biases and errors, as well as clinical trials to quantify potential harms and benefits. The Path Forward Generative AI holds significant promise for transforming healthcare and reducing clinician burnout, but realizing this potential requires comprehensive standards and regulatory clarity. A 2024 study published in npj Digital Medicine emphasized the need for defined leadership, adoption incentives, and ongoing regulation to deliver on the promise of generative AI in healthcare. Leadership should focus on establishing guidelines for LLM performance and identifying optimal clinical settings for AI tool trials. The study suggested that a subcommittee within the FDA, comprising physicians, healthcare administrators, developers, and investors, could effectively lead this effort. Additionally, widespread deployment of generative AI will likely require payer incentives, as most providers view these tools as capital expenses. With the right leadership, incentives, and regulatory framework, generative AI can be effectively implemented across the healthcare continuum to streamline clinical workflows and improve patient care. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Service Cloud with AI-Driven Intelligence Salesforce Enhances Service Cloud with AI-Driven Intelligence Engine Data science and analytics are rapidly becoming standard features in enterprise applications, Read more

May 11, 2024in Einstein 1 Platform, Generative AI, Salesforce Health and Life Sciences Solutions, Salesforce Health Cloud, Salesforce Implementation Services, Salesforce Optimization

LLM Performance

Statement Accuracy Prediction based on Language Model Activations

LLM Economies

Advanced RAG

Gen AI Role in Healthcare

Recent Posts

Understanding the Bag-of-Words Model in Natural Language Processing

10 AI Healthcare Trends Shaping the Future

State Space Search

Generative AI Adoption Accelerates in Healthcare, Survey Reveals

5 Ways Marketing Intelligence Transforms Campaign Performance and ROI

Contact Us

Be in touch today — and start your business on a path to success.

Category

Archives