GPU - gettectonic.com
Where LLMs Fall Short

LLM Economies

Throughout history, disruptive technologies have been the catalyst for major social and economic revolutions. The invention of the plow and irrigation systems 12,000 years ago sparked the Agricultural Revolution, while Johannes Gutenberg’s 15th-century printing press fueled the Protestant Reformation and helped propel Europe out of the Middle Ages into the Renaissance. In the 18th century, James Watt’s steam engine ushered in the Industrial Revolution. More recently, the internet has revolutionized communication, commerce, and information access, shrinking the world into a global village. Similarly, smartphones have transformed how people interact with their surroundings. Now, we stand at the dawn of the AI revolution. Large Language Models (LLMs) represent a monumental leap forward, with significant economic implications at both macro and micro levels. These models are reshaping global markets, driving new forms of currency, and creating a novel economic landscape. The reason LLMs are transforming industries and redefining economies is simple: they automate both routine and complex tasks that traditionally require human intelligence. They enhance decision-making processes, boost productivity, and facilitate cost reductions across various sectors. This enables organizations to allocate human resources toward more creative and strategic endeavors, resulting in the development of new products and services. From healthcare to finance to customer service, LLMs are creating new markets and driving AI-driven services like content generation and conversational assistants into the mainstream. To truly grasp the engine driving this new global economy, it’s essential to understand the inner workings of this disruptive technology. These posts will provide both a macro-level overview of the economic forces at play and a deep dive into the technical mechanics of LLMs, equipping you with a comprehensive understanding of the revolution happening now. Why Now? The Connection Between Language and Human Intelligence AI did not begin with ChatGPT’s arrival in November 2022. Many people were developing machine learning classification models in 1999, and the roots of AI go back even further. Artificial Intelligence was formally born in 1950, when Alan Turing—considered the father of theoretical computer science and famed for cracking the Nazi Enigma code during World War II—created the first formal definition of intelligence. This definition, known as the Turing Test, demonstrated the potential for machines to exhibit human-like intelligence through natural language conversations. The test involves a human evaluator who engages in conversations with both a human and a machine. If the evaluator cannot reliably distinguish between the two, the machine is considered to have passed the test. Remarkably, after 72 years of gradual AI development, ChatGPT simulated this very interaction, passing the Turing Test and igniting the current AI explosion. But why is language so closely tied to human intelligence, rather than, for example, vision? While 70% of our brain’s neurons are devoted to vision, OpenAI’s pioneering image generation model, DALL-E, did not trigger the same level of excitement as ChatGPT. The answer lies in the profound role language has played in human evolution. The Evolution of Language The development of language was the turning point in humanity’s rise to dominance on Earth. As Yuval Noah Harari points out in his book Sapiens: A Brief History of Humankind, it was the ability to gossip and discuss abstract concepts that set humans apart from other species. Complex communication, such as gossip, requires a shared, sophisticated language. Human language evolved from primitive cave signs to structured alphabets, which, along with grammar rules, created languages capable of expressing thousands of words. In today’s digital age, language has further evolved with the inclusion of emojis, and now with the advent of GenAI, tokens have become the latest cornerstone in this progression. These shifts highlight the extraordinary journey of human language, from simple symbols to intricate digital representations. In the next post, we will explore the intricacies of LLMs, focusing specifically on tokens. But before that, let’s delve into the economic forces shaping the LLM-driven world. The Forces Shaping the LLM Economy AI Giants in Competition Karl Marx and Friedrich Engels argued that those who control the means of production hold power. The tech giants of today understand that AI is the future means of production, and the race to dominate the LLM market is well underway. This competition is fierce, with industry leaders like OpenAI, Google, Microsoft, and Facebook battling for supremacy. New challengers such as Mistral (France), AI21 (Israel), and Elon Musk’s xAI and Anthropic are also entering the fray. The LLM industry is expanding exponentially, with billions of dollars of investment pouring in. For example, Anthropic has raised $4.5 billion from 43 investors, including major players like Amazon, Google, and Microsoft. The Scarcity of GPUs Just as Bitcoin mining requires vast computational resources, training LLMs demands immense computing power, driving a search for new energy sources. Microsoft’s recent investment in nuclear energy underscores this urgency. At the heart of LLM technology are Graphics Processing Units (GPUs), essential for powering deep neural networks. These GPUs have become scarce and expensive, adding to the competitive tension. Tokens: The New Currency of the LLM Economy Tokens are the currency driving the emerging AI economy. Just as money facilitates transactions in traditional markets, tokens are the foundation of LLM economics. But what exactly are tokens? Tokens are the basic units of text that LLMs process. They can be single characters, parts of words, or entire words. For example, the word “Oscar” might be split into two tokens, “os” and “car.” The performance of LLMs—quality, speed, and cost—hinges on how efficiently they generate these tokens. LLM providers price their services based on token usage, with different rates for input (prompt) and output (completion) tokens. As companies rely more on LLMs, especially for complex tasks like agentic applications, token usage will significantly impact operational costs. With fierce competition and the rise of open-source models like Llama-3.1, the cost of tokens is rapidly decreasing. For instance, OpenAI reduced its GPT-4 pricing by about 80% over the past year and a half. This trend enables companies to expand their portfolio of AI-powered products, further fueling the LLM economy. Context Windows: Expanding Capabilities

Read More
Snowflake Security and Development

Snowflake Security and Development

Snowflake Unveils AI Development and Enhanced Security Features At its annual Build virtual developer conference, Snowflake introduced a suite of new capabilities focused on AI development and strengthened security measures. These enhancements aim to simplify the creation of conversational AI tools, improve collaboration, and address data security challenges following a significant breach earlier this year. AI Development Updates Snowflake announced updates to its Cortex AI suite to streamline the development of conversational AI applications. These new tools focus on enabling faster, more efficient development while ensuring data integrity and trust. Highlights include: These features address enterprise demands for generative AI tools that boost productivity while maintaining governance over proprietary data. Snowflake aims to eliminate barriers to data-driven decision-making by enabling natural language queries and easy integration of structured and unstructured data into AI models. According to Christian Kleinerman, Snowflake’s EVP of Product, the goal is to reduce the time it takes for developers to build reliable, cost-effective AI applications: “We want to help customers build conversational applications for structured and unstructured data faster and more efficiently.” Security Enhancements Following a breach last May, where hackers accessed customer data via stolen login credentials, Snowflake has implemented new security features: These additions come alongside existing tools like the Horizon Catalog for data governance. Kleinerman noted that while Snowflake’s previous security measures were effective at preventing unauthorized access, the company recognizes the need to improve user adoption of these tools: “It’s on us to ensure our customers can fully leverage the security capabilities we offer. That’s why we’re adding more monitoring, insights, and recommendations.” Collaboration Features Snowflake is also enhancing collaboration through its new Internal Marketplace, which enables organizations to share data, AI tools, and applications across business units. The Native App Framework now integrates with Snowpark Container Services to simplify the distribution and monetization of analytics and AI products. AI Governance and Competitive Position Industry analysts highlight the growing importance of AI governance as enterprises increasingly adopt generative AI tools. David Menninger of ISG’s Ventana Research emphasized that Snowflake’s governance-focused features, such as LLM observability, fill a critical gap in AI tooling: “Trustworthy AI enhancements like model explainability and observability are vital as enterprises scale their use of AI.” With these updates, Snowflake continues to compete with Databricks and other vendors. Its strategy focuses on offering both API-based flexibility for developers and built-in tools for users seeking simpler solutions. By combining innovative AI development tools with robust security and collaboration features, Snowflake aims to meet the evolving needs of enterprises while positioning itself as a leader in the data platform and AI space. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more Top Ten Reasons Why Tectonic Loves the Cloud The Cloud is Good for Everyone – Why Tectonic loves the cloud You don’t need to worry about tracking licenses. Read more

Read More
Scaling Generative AI

Scaling Generative AI

Many organizations follow a hybrid approach to AI infrastructure, combining public clouds, colocation facilities, and on-prem solutions. Specialized GPU-as-a-service vendors, for instance, are becoming popular for handling high-demand AI computations, helping businesses manage costs without compromising performance. Business process outsourcing company TaskUs, for example, focuses on optimizing compute and data flows as it scales its gen AI deployments, while Cognizant advises that companies distinguish between training and inference needs, each with different latency requirements.

Read More
LLMs and AI

LLMs and AI

Large Language Models (LLMs): Revolutionizing AI and Custom Solutions Large Language Models (LLMs) are transforming artificial intelligence by enabling machines to generate and comprehend human-like text, making them indispensable across numerous industries. The global LLM market is experiencing explosive growth, projected to rise from $1.59 billion in 2023 to $259.8 billion by 2030. This surge is driven by the increasing demand for automated content creation, advances in AI technology, and the need for improved human-machine communication. Several factors are propelling this growth, including advancements in AI and Natural Language Processing (NLP), large datasets, and the rising importance of seamless human-machine interaction. Additionally, private LLMs are gaining traction as businesses seek more control over their data and customization. These private models provide tailored solutions, reduce dependency on third-party providers, and enhance data privacy. This guide will walk you through building your own private LLM, offering valuable insights for both newcomers and seasoned professionals. What are Large Language Models? Large Language Models (LLMs) are advanced AI systems that generate human-like text by processing vast amounts of data using sophisticated neural networks, such as transformers. These models excel in tasks such as content creation, language translation, question answering, and conversation, making them valuable across industries, from customer service to data analysis. LLMs are generally classified into three types: LLMs learn language rules by analyzing vast text datasets, similar to how reading numerous books helps someone understand a language. Once trained, these models can generate content, answer questions, and engage in meaningful conversations. For example, an LLM can write a story about a space mission based on knowledge gained from reading space adventure stories, or it can explain photosynthesis using information drawn from biology texts. Building a Private LLM Data Curation for LLMs Recent LLMs, such as Llama 3 and GPT-4, are trained on massive datasets—Llama 3 on 15 trillion tokens and GPT-4 on 6.5 trillion tokens. These datasets are drawn from diverse sources, including social media (140 trillion tokens), academic texts, and private data, with sizes ranging from hundreds of terabytes to multiple petabytes. This breadth of training enables LLMs to develop a deep understanding of language, covering diverse patterns, vocabularies, and contexts. Common data sources for LLMs include: Data Preprocessing After data collection, the data must be cleaned and structured. Key steps include: LLM Training Loop Key training stages include: Evaluating Your LLM After training, it is crucial to assess the LLM’s performance using industry-standard benchmarks: When fine-tuning LLMs for specific applications, tailor your evaluation metrics to the task. For instance, in healthcare, matching disease descriptions with appropriate codes may be a top priority. Conclusion Building a private LLM provides unmatched customization, enhanced data privacy, and optimized performance. From data curation to model evaluation, this guide has outlined the essential steps to create an LLM tailored to your specific needs. Whether you’re just starting or seeking to refine your skills, building a private LLM can empower your organization with state-of-the-art AI capabilities. For expert guidance or to kickstart your LLM journey, feel free to contact us for a free consultation. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
GPUs and AI Development

GPUs and AI Development

Graphics processing units (GPUs) have become widely recognized due to their growing role in AI development. However, a lesser-known but critical technology is also gaining attention: high-bandwidth memory (HBM). HBM is a high-density memory designed to overcome bottlenecks and maximize data transfer speeds between storage and processors. AI chipmakers like Nvidia rely on HBM for its superior bandwidth and energy efficiency. Its placement next to the GPU’s processor chip gives it a performance edge over traditional server RAM, which resides between storage and the processing unit. HBM’s ability to consume less power makes it ideal for AI model training, which demands significant energy resources. However, as the AI landscape transitions from model training to AI inferencing, HBM’s widespread adoption may slow. According to Gartner’s 2023 forecast, the use of accelerator chips incorporating HBM for AI model training is expected to decline from 65% in 2022 to 30% by 2027, as inferencing becomes more cost-effective with traditional technologies. How HBM Differs from Other Memory HBM shares similarities with other memory technologies, such as graphics double data rate (GDDR), in delivering high bandwidth for graphics-intensive tasks. But HBM stands out due to its unique positioning. Unlike GDDR, which sits on the printed circuit board of the GPU, HBM is placed directly beside the processor, enhancing speed by reducing signal delays caused by longer interconnections. This proximity, combined with its stacked DRAM architecture, boosts performance compared to GDDR’s side-by-side chip design. However, this stacked approach adds complexity. HBM relies on through-silicon via (TSV), a process that connects DRAM chips using electrical wires drilled through them, requiring larger die sizes and increasing production costs. According to analysts, this makes HBM more expensive and less efficient to manufacture than server DRAM, leading to higher yield losses during production. AI’s Demand for HBM Despite its manufacturing challenges, demand for HBM is surging due to its importance in AI model training. Major suppliers like SK Hynix, Samsung, and Micron have expanded production to meet this demand, with Micron reporting that its HBM is sold out through 2025. In fact, TrendForce predicts that HBM will contribute to record revenues for the memory industry in 2025. The high demand for GPUs, especially from Nvidia, drives the need for HBM as AI companies focus on accelerating model training. Hyperscalers, looking to monetize AI, are investing heavily in HBM to speed up the process. HBM’s Future in AI While HBM has proven essential for AI training, its future may be uncertain as the focus shifts to AI inferencing, which requires less intensive memory resources. As inferencing becomes more prevalent, companies may opt for more affordable and widely available memory solutions. Experts also see HBM following the same trajectory as other memory technologies, with continuous efforts to increase bandwidth and density. The next generation, HBM3E, is already in production, with HBM4 planned for release in 2026, promising even higher speeds. Ultimately, the adoption of HBM will depend on market demand, especially from hyperscalers. If AI continues to push the limits of GPU performance, HBM could remain a critical component. However, if businesses prioritize cost efficiency over peak performance, HBM’s growth may level off. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more Alphabet Soup of Cloud Terminology As with any technology, the cloud brings its own alphabet soup of terms. This insight will hopefully help you navigate Read more

Read More
Salesforce LlamaRank

Salesforce LlamaRank

Document ranking remains a critical challenge in information retrieval and natural language processing. Effective document retrieval and ranking are crucial for enhancing the performance of search engines, question-answering systems, and Retrieval-Augmented Generation (RAG) systems. Traditional ranking models often struggle to balance result precision with computational efficiency, especially when dealing with large datasets and diverse query types. This challenge underscores the growing need for advanced models that can provide accurate, contextually relevant results in real-time from continuous data streams and increasingly complex queries. Salesforce AI Research has introduced a cutting-edge reranker named LlamaRank, designed to significantly enhance document ranking and code search tasks across various datasets. Built on the Llama3-8B-Instruct architecture, LlamaRank integrates advanced linear and calibrated scoring mechanisms, achieving both speed and interpretability. The Salesforce AI Research team developed LlamaRank as a specialized tool for document relevancy ranking. Enhanced by iterative feedback from their dedicated RLHF data annotation team, LlamaRank outperforms many leading APIs in general document ranking and sets a new standard for code search performance. The model’s training data includes high-quality synthesized data from Llama3-70B and Llama3-405B, along with human-labeled annotations, covering a broad range of domains from topic-based search and document QA to code QA. In RAG systems, LlamaRank plays a crucial role. Initially, a query is processed using a less precise but cost-effective method, such as semantic search with embeddings, to generate a list of potential documents. The reranker then refines this list to identify the most relevant documents, ensuring that the language model is fine-tuned with only the most pertinent information, thereby improving accuracy and coherence in the output responses. LlamaRank’s architecture, based on Llama3-8B-Instruct, leverages a diverse training corpus of synthetic and human-labeled data. This extensive dataset enables LlamaRank to excel in various tasks, from general document retrieval to specialized code searches. The model underwent multiple feedback cycles from Salesforce’s data annotation team to achieve optimal accuracy and relevance in its scoring predictions. During inference, LlamaRank predicts token probabilities and calculates a numeric relevance score, facilitating efficient reranking. Demonstrated on several public datasets, LlamaRank has shown impressive performance. For instance, on the SQuAD dataset for question answering, LlamaRank achieved a hit rate of 99.3%. It posted a hit rate of 92.0% on the TriviaQA dataset. In code search benchmarks, LlamaRank recorded a hit rate of 81.8% on the Neural Code Search dataset and 98.6% on the TrailheadQA dataset. These results highlight LlamaRank’s versatility and efficiency across various document types and query scenarios. LlamaRank’s technical specifications further emphasize its advantages. Supporting up to 8,000 tokens per document, it significantly outperforms competitors like Cohere’s reranker. It delivers low-latency performance, ranking 64 documents in under 200 ms with a single H100 GPU, compared to approximately 3.13 seconds on Cohere’s serverless API. Additionally, LlamaRank features linear scoring calibration, offering clear and interpretable relevance scores. While LlamaRank’s size of 8 billion parameters contributes to its high performance, it is approaching the upper limits of reranking model size. Future research may focus on optimizing model size to balance quality and efficiency. Overall, LlamaRank from Salesforce AI Research marks a significant advancement in reranking technology, promising to greatly enhance RAG systems’ effectiveness across a wide range of applications. With its powerful performance, efficiency, and clear scoring, LlamaRank represents a major step forward in document retrieval and search accuracy. The community eagerly anticipates its broader adoption and further development. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Anthropic’s New Approach to RAG

Anthropic’s New Approach to RAG

advanced RAG methodology demonstrates how AI can overcome traditional challenges, delivering more precise, context-aware responses while maintaining efficiency and scalability.

Read More
Small Language Models Explained

Small Language Models Explained

Exploring Small Language Models (SLMs): Capabilities and Applications Large Language Models (LLMs) have been prominent in AI for some time, but Small Language Models (SLMs) are now enhancing our ability to work with natural and programming languages. While LLMs excel in general language understanding, certain applications require more accuracy and domain-specific knowledge than these models can provide. This has created a demand for custom SLMs that offer LLM-like performance while reducing runtime costs and providing a secure, manageable environment. In this insight, we dig down into the world of SLMs, exploring their unique characteristics, benefits, and applications. We also discuss fine-tuning methods applied to Llama-2–13b, an SLM, to address specific challenges. The goal is to investigate how to make the fine-tuning process platform-independent. We selected Databricks for this purpose due to its compatibility with major cloud providers like Azure, Amazon Web Services (AWS), and Google Cloud Platform. What Are Small Language Models? In AI and natural language processing, SLMs are lightweight generative models with a focus on specific tasks. The term “small” refers to: SLMs like Google Gemini Nano, Microsoft’s Orca-2–7b, and Meta’s Llama-2–13b run efficiently on a single GPU and include over 5 billion parameters. SLMs vs. LLMs Applications of SLMs SLMs are increasingly used across various sectors, including healthcare, technology, and beyond. Common applications include: Fine-Tuning Small Language Models Fine-tuning involves additional training of a pre-trained model to make it more domain-specific. This process updates the model’s parameters with new data to enhance its performance in targeted applications, such as text generation or question answering. Hardware Requirements for Fine-Tuning The hardware needs depend on the model size, project scale, and dataset. General recommendations include: Data Preparation Preparing data involves extracting text from PDFs, cleaning it, generating question-and-answer pairs, and then fine-tuning the model. Although GPT-3.5 was used for generating Q&A pairs, SLMs can also be utilized for this purpose based on the use case. Fine-Tuning Process You can use HuggingFace tools for fine-tuning Llama-2–13b-chat-hf. The dataset was converted into a HuggingFace-compatible format, and quantization techniques were applied to optimize performance. The fine-tuning lasted about 16 hours over 50 epochs, with the cost around $100/£83, excluding trial costs. Results and Observations The fine-tuned model demonstrated strong performance, with over 70% of answers being highly similar to those generated by GPT-3.5. The SLM achieved comparable results despite having fewer parameters. The process was successful on both AWS and Databricks platforms, showcasing the model’s adaptability. SLMs have some limitations compared to LLMs, such as higher operational costs and restricted knowledge bases. However, they offer benefits in efficiency, versatility, and environmental impact. As SLMs continue to evolve, their relevance and popularity are likely to increase, especially with new models like Gemini Nano and Mixtral entering the market. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Large and Small Language Models

Large and Small Language Models

Understanding Language Models in AI Language models are sophisticated AI systems designed to generate natural human language, a task that is far from simple. These models operate as probabilistic machine learning systems, predicting the likelihood of word sequences to emulate human-like intelligence. In the scientific realm, the focus of language models has been twofold: While today’s cutting-edge AI models in Natural Language Processing (NLP) are impressive, they have not yet fully passed the Turing Test—a benchmark where a machine’s communication is indistinguishable from that of a human. The Emergence of Language Models We are approaching this milestone with advancements in Large Language Models (LLMs) and the promising but less discussed Small Language Models (SLMs). Large Language Models compared to Small Language Models LLMs like ChatGPT have garnered significant attention due to their ability to handle complex interactions and provide insightful responses. These models distill vast amounts of internet data into concise and relevant information, offering an alternative to traditional search methods. Conversely, SLMs, such as Mistral 7B, while less flashy, are valuable for specific applications. They typically contain fewer parameters and focus on specialized domains, providing targeted expertise without the broad capabilities of LLMs. How LLMs Work Comparing LLMs and SLMs Choosing the Right Language Model The decision between LLMs and SLMs depends on your specific needs and available resources. LLMs are well-suited for broad applications like chatbots and customer support. In contrast, SLMs are ideal for specialized tasks in fields such as medicine, law, and finance, where domain-specific knowledge is crucial. Large and Small Language Models’ Roles Language models are powerful tools that, depending on their size and focus, can either provide broad capabilities or specialized expertise. Understanding their strengths and limitations helps in selecting the right model for your use case. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Machine Learning on Kubernetes

Machine Learning on Kubernetes

How and Why to Run Machine Learning Workloads on Kubernetes Running machine learning (ML) model development and deployment on Kubernetes has become essential for optimizing resources and managing costs. As AI and ML tools gain mainstream acceptance, business and IT professionals are increasingly familiar with these technologies. With the growing buzz around AI, engineering needs in ML and AI have expanded, particularly in managing the complexities and costs associated with these workloads. The Need for Kubernetes in ML As ML use cases become more complex, training models has become increasingly resource-intensive and costly. This has driven up demand and costs for GPUs, a key resource for ML tasks. Containerizing ML workloads offers a solution to these challenges by improving scalability, automation, and infrastructure efficiency. Kubernetes, a leading tool for container orchestration, is particularly effective for managing ML processes. By decoupling workloads into manageable containers, Kubernetes helps streamline ML operations and reduce costs. Understanding Kubernetes The evolution of engineering priorities has consistently focused on minimizing application footprints. From mainframes to modern servers and virtualization, the trend has been towards reducing operational overhead. Containers emerged as a solution to this trend, offering a way to isolate application stacks while maintaining performance. Initially, containers used Linux cgroups and namespaces, but their popularity surged with Docker. However, Docker containers had limitations in scaling and automatic recovery. Kubernetes was developed to address these issues. As an open-source orchestration platform, Kubernetes manages containerized workloads by ensuring containers are always running and properly scaled. Containers run inside resources called pods, which include everything needed to run the application. Kubernetes has also expanded its capabilities to orchestrate other resources like virtual machines. Running ML Workloads on Kubernetes ML systems demand significant computing power, including CPU, memory, and GPU resources. Traditionally, this required multiple servers, which was inefficient and costly. Kubernetes addresses this challenge by orchestrating containers and decoupling workloads, allowing multiple pods to run models simultaneously and share resources like CPU, memory, and GPU power. Using Kubernetes for ML can enhance practices such as: Challenges of ML on Kubernetes Despite its advantages, running ML workloads on Kubernetes comes with challenges: Key Tools for ML on Kubernetes Kubernetes requires specific tools to manage ML workloads effectively. These tools integrate with Kubernetes to address the unique needs of ML tasks: TensorFlow is another option, but it lacks the dedicated integration and optimization of Kubernetes-specific tools like Kubeflow. For those new to running ML workloads on Kubernetes, Kubeflow is often the best starting point. It is the most advanced and mature tool in terms of capabilities, ease of use, community support, and functionality. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Einstein Code Generation and Amazon SageMaker

Einstein Code Generation and Amazon SageMaker

Salesforce and the Evolution of AI-Driven CRM Solutions Salesforce, Inc., headquartered in San Francisco, California, is a leading American cloud-based software company specializing in customer relationship management (CRM) software and applications. Their offerings include sales, customer service, marketing automation, e-commerce, analytics, and application development. Salesforce is at the forefront of integrating artificial general intelligence (AGI) into its services, enhancing its flagship SaaS CRM platform with predictive and generative AI capabilities and advanced automation features. Einstein Code Generation and Amazon SageMaker. Salesforce Einstein: Pioneering AI in Business Applications Salesforce Einstein represents a suite of AI technologies embedded within Salesforce’s Customer Success Platform, designed to enhance productivity and client engagement. With over 60 features available across different pricing tiers, Einstein’s capabilities are categorized into machine learning (ML), natural language processing (NLP), computer vision, and automatic speech recognition. These tools empower businesses to deliver personalized and predictive customer experiences across various functions, such as sales and customer service. Key components include out-of-the-box AI features like sales email generation in Sales Cloud and service replies in Service Cloud, along with tools like Copilot, Prompt, and Model Builder within Einstein 1 Studio for custom AI development. The Salesforce Einstein AI Platform Team: Enhancing AI Capabilities The Salesforce Einstein AI Platform team is responsible for the ongoing development and enhancement of Einstein’s AI applications. They focus on advancing large language models (LLMs) to support a wide range of business applications, aiming to provide cutting-edge NLP capabilities. By partnering with leading technology providers and leveraging open-source communities and cloud services like AWS, the team ensures Salesforce customers have access to the latest AI technologies. Optimizing LLM Performance with Amazon SageMaker In early 2023, the Einstein team sought a solution to host CodeGen, Salesforce’s in-house open-source LLM for code understanding and generation. CodeGen enables translation from natural language to programming languages like Python and is particularly tuned for the Apex programming language, integral to Salesforce’s CRM functionality. The team required a hosting solution that could handle a high volume of inference requests and multiple concurrent sessions while meeting strict throughput and latency requirements for their EinsteinGPT for Developers tool, which aids in code generation and review. After evaluating various hosting solutions, the team selected Amazon SageMaker for its robust GPU access, scalability, flexibility, and performance optimization features. SageMaker’s specialized deep learning containers (DLCs), including the Large Model Inference (LMI) containers, provided a comprehensive solution for efficient LLM hosting and deployment. Key features included advanced batching strategies, efficient request routing, and access to high-end GPUs, which significantly enhanced the model’s performance. Key Achievements and Learnings Einstein Code Generation and Amazon SageMaker The integration of SageMaker resulted in a dramatic improvement in the performance of the CodeGen model, boosting throughput by over 6,500% and reducing latency significantly. The use of SageMaker’s tools and resources enabled the team to optimize their models, streamline deployment, and effectively manage resource use, setting a benchmark for future projects. Conclusion and Future Directions Salesforce’s experience with SageMaker highlights the critical importance of leveraging advanced tools and strategies in AI model optimization. The successful collaboration underscores the need for continuous innovation and adaptation in AI technologies, ensuring that Salesforce remains at the cutting edge of CRM solutions. For those interested in deploying their LLMs on SageMaker, Salesforce’s experience serves as a valuable case study, demonstrating the platform’s capabilities in enhancing AI performance and scalability. To begin hosting your own LLMs on SageMaker, consider exploring their detailed guides and resources. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
gettectonic.com