GPU Archives - gettectonic.com - Page 2
Salesforce LlamaRank

Salesforce LlamaRank

Document ranking remains a critical challenge in information retrieval and natural language processing. Effective document retrieval and ranking are crucial for enhancing the performance of search engines, question-answering systems, and Retrieval-Augmented Generation (RAG) systems. Traditional ranking models often struggle to balance result precision with computational efficiency, especially when dealing with large datasets and diverse query types. This challenge underscores the growing need for advanced models that can provide accurate, contextually relevant results in real-time from continuous data streams and increasingly complex queries. Salesforce AI Research has introduced a cutting-edge reranker named LlamaRank, designed to significantly enhance document ranking and code search tasks across various datasets. Built on the Llama3-8B-Instruct architecture, LlamaRank integrates advanced linear and calibrated scoring mechanisms, achieving both speed and interpretability. The Salesforce AI Research team developed LlamaRank as a specialized tool for document relevancy ranking. Enhanced by iterative feedback from their dedicated RLHF data annotation team, LlamaRank outperforms many leading APIs in general document ranking and sets a new standard for code search performance. The model’s training data includes high-quality synthesized data from Llama3-70B and Llama3-405B, along with human-labeled annotations, covering a broad range of domains from topic-based search and document QA to code QA. In RAG systems, LlamaRank plays a crucial role. Initially, a query is processed using a less precise but cost-effective method, such as semantic search with embeddings, to generate a list of potential documents. The reranker then refines this list to identify the most relevant documents, ensuring that the language model is fine-tuned with only the most pertinent information, thereby improving accuracy and coherence in the output responses. LlamaRank’s architecture, based on Llama3-8B-Instruct, leverages a diverse training corpus of synthetic and human-labeled data. This extensive dataset enables LlamaRank to excel in various tasks, from general document retrieval to specialized code searches. The model underwent multiple feedback cycles from Salesforce’s data annotation team to achieve optimal accuracy and relevance in its scoring predictions. During inference, LlamaRank predicts token probabilities and calculates a numeric relevance score, facilitating efficient reranking. Demonstrated on several public datasets, LlamaRank has shown impressive performance. For instance, on the SQuAD dataset for question answering, LlamaRank achieved a hit rate of 99.3%. It posted a hit rate of 92.0% on the TriviaQA dataset. In code search benchmarks, LlamaRank recorded a hit rate of 81.8% on the Neural Code Search dataset and 98.6% on the TrailheadQA dataset. These results highlight LlamaRank’s versatility and efficiency across various document types and query scenarios. LlamaRank’s technical specifications further emphasize its advantages. Supporting up to 8,000 tokens per document, it significantly outperforms competitors like Cohere’s reranker. It delivers low-latency performance, ranking 64 documents in under 200 ms with a single H100 GPU, compared to approximately 3.13 seconds on Cohere’s serverless API. Additionally, LlamaRank features linear scoring calibration, offering clear and interpretable relevance scores. While LlamaRank’s size of 8 billion parameters contributes to its high performance, it is approaching the upper limits of reranking model size. Future research may focus on optimizing model size to balance quality and efficiency. Overall, LlamaRank from Salesforce AI Research marks a significant advancement in reranking technology, promising to greatly enhance RAG systems’ effectiveness across a wide range of applications. With its powerful performance, efficiency, and clear scoring, LlamaRank represents a major step forward in document retrieval and search accuracy. The community eagerly anticipates its broader adoption and further development. Like Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More
Anthropic’s New Approach to RAG

Anthropic’s New Approach to RAG

advanced RAG methodology demonstrates how AI can overcome traditional challenges, delivering more precise, context-aware responses while maintaining efficiency and scalability.

Read More
Small Language Models Explained

Small Language Models Explained

Exploring Small Language Models (SLMs): Capabilities and Applications Large Language Models (LLMs) have been prominent in AI for some time, but Small Language Models (SLMs) are now enhancing our ability to work with natural and programming languages. While LLMs excel in general language understanding, certain applications require more accuracy and domain-specific knowledge than these models can provide. This has created a demand for custom SLMs that offer LLM-like performance while reducing runtime costs and providing a secure, manageable environment. In this insight, we dig down into the world of SLMs, exploring their unique characteristics, benefits, and applications. We also discuss fine-tuning methods applied to Llama-2–13b, an SLM, to address specific challenges. The goal is to investigate how to make the fine-tuning process platform-independent. We selected Databricks for this purpose due to its compatibility with major cloud providers like Azure, Amazon Web Services (AWS), and Google Cloud Platform. What Are Small Language Models? In AI and natural language processing, SLMs are lightweight generative models with a focus on specific tasks. The term “small” refers to: SLMs like Google Gemini Nano, Microsoft’s Orca-2–7b, and Meta’s Llama-2–13b run efficiently on a single GPU and include over 5 billion parameters. SLMs vs. LLMs Applications of SLMs SLMs are increasingly used across various sectors, including healthcare, technology, and beyond. Common applications include: Fine-Tuning Small Language Models Fine-tuning involves additional training of a pre-trained model to make it more domain-specific. This process updates the model’s parameters with new data to enhance its performance in targeted applications, such as text generation or question answering. Hardware Requirements for Fine-Tuning The hardware needs depend on the model size, project scale, and dataset. General recommendations include: Data Preparation Preparing data involves extracting text from PDFs, cleaning it, generating question-and-answer pairs, and then fine-tuning the model. Although GPT-3.5 was used for generating Q&A pairs, SLMs can also be utilized for this purpose based on the use case. Fine-Tuning Process You can use HuggingFace tools for fine-tuning Llama-2–13b-chat-hf. The dataset was converted into a HuggingFace-compatible format, and quantization techniques were applied to optimize performance. The fine-tuning lasted about 16 hours over 50 epochs, with the cost around $100/£83, excluding trial costs. Results and Observations The fine-tuned model demonstrated strong performance, with over 70% of answers being highly similar to those generated by GPT-3.5. The SLM achieved comparable results despite having fewer parameters. The process was successful on both AWS and Databricks platforms, showcasing the model’s adaptability. SLMs have some limitations compared to LLMs, such as higher operational costs and restricted knowledge bases. However, they offer benefits in efficiency, versatility, and environmental impact. As SLMs continue to evolve, their relevance and popularity are likely to increase, especially with new models like Gemini Nano and Mixtral entering the market. Like Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More
Large and Small Language Models

Large and Small Language Models

Understanding Language Models in AI Language models are sophisticated AI systems designed to generate natural human language, a task that is far from simple. These models operate as probabilistic machine learning systems, predicting the likelihood of word sequences to emulate human-like intelligence. In the scientific realm, the focus of language models has been twofold: While today’s cutting-edge AI models in Natural Language Processing (NLP) are impressive, they have not yet fully passed the Turing Test—a benchmark where a machine’s communication is indistinguishable from that of a human. The Emergence of Language Models We are approaching this milestone with advancements in Large Language Models (LLMs) and the promising but less discussed Small Language Models (SLMs). Large Language Models compared to Small Language Models LLMs like ChatGPT have garnered significant attention due to their ability to handle complex interactions and provide insightful responses. These models distill vast amounts of internet data into concise and relevant information, offering an alternative to traditional search methods. Conversely, SLMs, such as Mistral 7B, while less flashy, are valuable for specific applications. They typically contain fewer parameters and focus on specialized domains, providing targeted expertise without the broad capabilities of LLMs. How LLMs Work Comparing LLMs and SLMs Choosing the Right Language Model The decision between LLMs and SLMs depends on your specific needs and available resources. LLMs are well-suited for broad applications like chatbots and customer support. In contrast, SLMs are ideal for specialized tasks in fields such as medicine, law, and finance, where domain-specific knowledge is crucial. Large and Small Language Models’ Roles Language models are powerful tools that, depending on their size and focus, can either provide broad capabilities or specialized expertise. Understanding their strengths and limitations helps in selecting the right model for your use case. Like Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More
Machine Learning on Kubernetes

Machine Learning on Kubernetes

How and Why to Run Machine Learning Workloads on Kubernetes Running machine learning (ML) model development and deployment on Kubernetes has become essential for optimizing resources and managing costs. As AI and ML tools gain mainstream acceptance, business and IT professionals are increasingly familiar with these technologies. With the growing buzz around AI, engineering needs in ML and AI have expanded, particularly in managing the complexities and costs associated with these workloads. The Need for Kubernetes in ML As ML use cases become more complex, training models has become increasingly resource-intensive and costly. This has driven up demand and costs for GPUs, a key resource for ML tasks. Containerizing ML workloads offers a solution to these challenges by improving scalability, automation, and infrastructure efficiency. Kubernetes, a leading tool for container orchestration, is particularly effective for managing ML processes. By decoupling workloads into manageable containers, Kubernetes helps streamline ML operations and reduce costs. Understanding Kubernetes The evolution of engineering priorities has consistently focused on minimizing application footprints. From mainframes to modern servers and virtualization, the trend has been towards reducing operational overhead. Containers emerged as a solution to this trend, offering a way to isolate application stacks while maintaining performance. Initially, containers used Linux cgroups and namespaces, but their popularity surged with Docker. However, Docker containers had limitations in scaling and automatic recovery. Kubernetes was developed to address these issues. As an open-source orchestration platform, Kubernetes manages containerized workloads by ensuring containers are always running and properly scaled. Containers run inside resources called pods, which include everything needed to run the application. Kubernetes has also expanded its capabilities to orchestrate other resources like virtual machines. Running ML Workloads on Kubernetes ML systems demand significant computing power, including CPU, memory, and GPU resources. Traditionally, this required multiple servers, which was inefficient and costly. Kubernetes addresses this challenge by orchestrating containers and decoupling workloads, allowing multiple pods to run models simultaneously and share resources like CPU, memory, and GPU power. Using Kubernetes for ML can enhance practices such as: Challenges of ML on Kubernetes Despite its advantages, running ML workloads on Kubernetes comes with challenges: Key Tools for ML on Kubernetes Kubernetes requires specific tools to manage ML workloads effectively. These tools integrate with Kubernetes to address the unique needs of ML tasks: TensorFlow is another option, but it lacks the dedicated integration and optimization of Kubernetes-specific tools like Kubeflow. For those new to running ML workloads on Kubernetes, Kubeflow is often the best starting point. It is the most advanced and mature tool in terms of capabilities, ease of use, community support, and functionality. Like Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More
Einstein Code Generation and Amazon SageMaker

Einstein Code Generation and Amazon SageMaker

Salesforce and the Evolution of AI-Driven CRM Solutions Salesforce, Inc., headquartered in San Francisco, California, is a leading American cloud-based software company specializing in customer relationship management (CRM) software and applications. Their offerings include sales, customer service, marketing automation, e-commerce, analytics, and application development. Salesforce is at the forefront of integrating artificial general intelligence (AGI) into its services, enhancing its flagship SaaS CRM platform with predictive and generative AI capabilities and advanced automation features. Einstein Code Generation and Amazon SageMaker. Salesforce Einstein: Pioneering AI in Business Applications Salesforce Einstein represents a suite of AI technologies embedded within Salesforce’s Customer Success Platform, designed to enhance productivity and client engagement. With over 60 features available across different pricing tiers, Einstein’s capabilities are categorized into machine learning (ML), natural language processing (NLP), computer vision, and automatic speech recognition. These tools empower businesses to deliver personalized and predictive customer experiences across various functions, such as sales and customer service. Key components include out-of-the-box AI features like sales email generation in Sales Cloud and service replies in Service Cloud, along with tools like Copilot, Prompt, and Model Builder within Einstein 1 Studio for custom AI development. The Salesforce Einstein AI Platform Team: Enhancing AI Capabilities The Salesforce Einstein AI Platform team is responsible for the ongoing development and enhancement of Einstein’s AI applications. They focus on advancing large language models (LLMs) to support a wide range of business applications, aiming to provide cutting-edge NLP capabilities. By partnering with leading technology providers and leveraging open-source communities and cloud services like AWS, the team ensures Salesforce customers have access to the latest AI technologies. Optimizing LLM Performance with Amazon SageMaker In early 2023, the Einstein team sought a solution to host CodeGen, Salesforce’s in-house open-source LLM for code understanding and generation. CodeGen enables translation from natural language to programming languages like Python and is particularly tuned for the Apex programming language, integral to Salesforce’s CRM functionality. The team required a hosting solution that could handle a high volume of inference requests and multiple concurrent sessions while meeting strict throughput and latency requirements for their EinsteinGPT for Developers tool, which aids in code generation and review. After evaluating various hosting solutions, the team selected Amazon SageMaker for its robust GPU access, scalability, flexibility, and performance optimization features. SageMaker’s specialized deep learning containers (DLCs), including the Large Model Inference (LMI) containers, provided a comprehensive solution for efficient LLM hosting and deployment. Key features included advanced batching strategies, efficient request routing, and access to high-end GPUs, which significantly enhanced the model’s performance. Key Achievements and Learnings Einstein Code Generation and Amazon SageMaker The integration of SageMaker resulted in a dramatic improvement in the performance of the CodeGen model, boosting throughput by over 6,500% and reducing latency significantly. The use of SageMaker’s tools and resources enabled the team to optimize their models, streamline deployment, and effectively manage resource use, setting a benchmark for future projects. Conclusion and Future Directions Salesforce’s experience with SageMaker highlights the critical importance of leveraging advanced tools and strategies in AI model optimization. The successful collaboration underscores the need for continuous innovation and adaptation in AI technologies, ensuring that Salesforce remains at the cutting edge of CRM solutions. For those interested in deploying their LLMs on SageMaker, Salesforce’s experience serves as a valuable case study, demonstrating the platform’s capabilities in enhancing AI performance and scalability. To begin hosting your own LLMs on SageMaker, consider exploring their detailed guides and resources. Like Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More
AI Capability Maturity Model

AI Capability Maturity Model

The AI Capability Maturity Model (AI CMM), devised by the Artificial Intelligence Center of Excellence within the GSA IT Modernization Centers of Excellence (CoE), functions as a standardized framework for federal agencies to evaluate their organizational and operational maturity levels. It is equally useful for private organizations in aligning them with predefined objectives. Instead of imposing normative capability assessments, the AI CMM concentrates on illuminating significant milestones indicative of maturity levels along the AI journey. The AI Capability Maturity Model focuses primarily on the development of AI capabilities within an organization. It evaluates an organization’s maturity across four main areas: data, algorithms, technology, and people. Serving as a valuable tool, the AI CMM assists organizations in shaping their unique AI roadmap and investment strategy. The outcomes derived from AI CMM analysis empower decision-makers to identify investment areas that address immediate goals for rapid AI adoption while aligning with broader enterprise objectives in the long run. Maturity vs capability models A maturity model tends to measure activities, such as whether a certain tool or process has been implemented. In contrast, capability models are outcome-based, which means you need to use measurements of key outcomes to confirm that changes result in improvements. AI development rooted in sound software practices underpins much of the content discussed in this and other chapters. Though not explicitly delving into agile development methodology, Dev(Sec)Ops, or cloud and infrastructure strategies, these elements are fundamental to the successful development of AI solutions. The AI CMM elaborates on how a robust IT infrastructure leads to the most successful development of an organization’s AI practice. What are the maturity levels of AI? What are the maturity levels of Artificial Intelligence? Or it can be measured this way. AI Maturity Model Why is AI maturity important? The AI Maturity Assessment is a process designed to help organizations evaluate their current AI capabilities, identify gaps and areas for improvement, and develop a roadmap to build a more effective AI program. Organizational Maturity Areas Organizational maturity areas represent the capacity to embed AI capabilities across the organization. Two approaches, top-down and user-centric, offer distinct perspectives on organizational maturity. Top-Down, Organizational View Bottom-Up, User-centric View Operational Maturity Areas Operational maturity areas represent organizational functions impacting the implementation of AI capabilities. Each area is treated as a discrete capability for maturity evaluation, yet they generally depend on one another. PeopleOps CloudOps DevOps SecOps DataOps MLOps AIOps AI Capability Maturity Model This comprehensive overview of organizational and operational maturity areas underlines the multifaceted nature of AI implementation and the critical role played by diverse elements in ensuring success across different layers of an organization. How AI is transforming the world? AI-powered technologies such as natural language processing, image and audio recognition, and computer vision have revolutionized the way we interact with and consume media. With AI, we are able to process and analyze vast amounts of data quickly, making it easier to find and access the information we need. Like1 Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More

Layers of the AI Stack

The AI stack refers to the layered architecture of technologies and components that work together to build, deploy, and manage artificial intelligence (AI) systems. Each layer of the stack plays a critical role in enabling AI capabilities, from data collection to model deployment and beyond. Here’s a breakdown of the key layers of the AI stack: 1. Data Layer The foundation of any AI system is data. This layer involves collecting, storing, and managing the data required to train and operate AI models. Key Components: 2. Infrastructure Layer This layer provides the computational power and hardware needed to process data and run AI models. Key Components: 3. Framework and Tools Layer This layer includes the software frameworks and tools used to build, train, and optimize AI models. Key Components: 4. Model Layer This is the core layer where AI models are developed, trained, and fine-tuned. Key Components: 5. Application Layer This layer focuses on deploying AI models into real-world applications and integrating them with existing systems. Key Components: 6. Orchestration and Management Layer This layer ensures that AI systems are scalable, reliable, and efficient in production environments. Key Components: 7. Business Layer This layer focuses on the business value of AI, including use cases, ROI, and ethical considerations. Key Components: 8. Ecosystem Layer This layer includes the external tools, services, and communities that support AI development and deployment. Key Components: How the Layers Work Together Why the AI Stack Matters The AI stack provides a structured approach to building and deploying AI systems. By understanding and optimizing each layer, organizations can: Conclusion The AI stack is a comprehensive framework that enables organizations to harness the power of AI effectively. By mastering each layer—from data collection to business value—you can build robust, scalable, and impactful AI solutions. Whether you’re a startup or an enterprise, understanding the AI stack is key to staying competitive in the age of artificial intelligence. Content updated March 2025. Like Related Posts Who is Salesforce? Who is Salesforce? Here is their story in their own words. From our inception, we’ve proudly embraced the identity of Read more Salesforce Marketing Cloud Transactional Emails Salesforce Marketing Cloud Transactional Emails are immediate, automated, non-promotional messages crucial to business operations and customer satisfaction, such as order Read more Salesforce Unites Einstein Analytics with Financial CRM Salesforce has unveiled a comprehensive analytics solution tailored for wealth managers, home office professionals, and retail bankers, merging its Financial Read more AI-Driven Propensity Scores AI plays a crucial role in propensity score estimation as it can discern underlying patterns between treatments and confounding variables Read more

Read More
gettectonic.com