Tokens Archives - gettectonic.com
Salesforce AI Research Introduces BLIP-3-Video

Salesforce AI Research Introduces BLIP-3-Video

Salesforce AI Research Introduces BLIP-3-Video: A Groundbreaking Multimodal Model for Efficient Video Understanding Vision-language models (VLMs) are transforming artificial intelligence by merging visual and textual data, enabling advancements in video analysis, human-computer interaction, and multimedia applications. These tools empower systems to generate captions, answer questions, and support decision-making, driving innovation in industries like entertainment, healthcare, and autonomous systems. However, the exponential growth in video-based tasks has created a demand for more efficient processing solutions that can manage the vast amounts of visual and temporal data inherent in videos. The Challenge of Scaling Video Understanding Existing video-processing models face significant inefficiencies. Many rely on processing each frame individually, creating thousands of visual tokens that demand extensive computational resources. This approach struggles with long or complex videos, where balancing computational efficiency and accurate temporal understanding becomes crucial. Attempts to address this issue, such as pooling techniques used by models like Video-ChatGPT and LLaVA-OneVision, have only partially succeeded, as they still produce thousands of tokens. Introducing BLIP-3-Video: A Breakthrough in Token Efficiency To tackle these challenges, Salesforce AI Research has developed BLIP-3-Video, a cutting-edge vision-language model optimized for video processing. The key innovation lies in its temporal encoder, which reduces visual tokens to just 16–32 tokens per video, significantly lowering computational requirements while maintaining strong performance. The temporal encoder employs a spatio-temporal attentional pooling mechanism, selectively extracting the most informative data from video frames. By consolidating spatial and temporal information into compact video-level tokens, BLIP-3-Video streamlines video processing without sacrificing accuracy. Efficient Architecture for Scalable Video Tasks BLIP-3-Video’s architecture integrates: This design ensures that the model efficiently captures essential temporal information while minimizing redundant data. Performance Highlights BLIP-3-Video demonstrates remarkable efficiency, achieving accuracy comparable to state-of-the-art models like Tarsier-34B while using a fraction of the tokens: For context, Tarsier-34B requires 4608 tokens for eight video frames, whereas BLIP-3-Video achieves similar results with only 32 tokens. On multiple-choice tasks, the model excelled: These results highlight BLIP-3-Video as one of the most token-efficient models in video understanding, offering top-tier performance while dramatically reducing computational costs. Advancing AI for Real-World Video Applications BLIP-3-Video addresses the critical challenge of token inefficiency, proving that complex video data can be processed effectively with far fewer resources. Developed by Salesforce AI Research, the model paves the way for scalable, real-time video processing across industries, including healthcare, autonomous systems, and entertainment. By combining efficiency with high performance, BLIP-3-Video sets a new standard for vision-language models, driving the practical application of AI in video-based systems. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More

Why Build a General-Purpose Agent?

A general-purpose LLM agent serves as an excellent starting point for prototyping use cases and establishing the foundation for a custom agentic architecture tailored to your needs. What is an LLM Agent? An LLM (Large Language Model) agent is a program where execution logic is governed by the underlying model. Unlike approaches such as few-shot prompting or fixed workflows, LLM agents adapt dynamically. They can determine which tools to use (e.g., web search or code execution), how to use them, and iterate based on results. This adaptability enables handling diverse tasks with minimal configuration. Agentic Architectures Explained:Agentic systems range from the reliability of fixed workflows to the flexibility of autonomous agents. For instance: Your architecture choice will depend on the desired balance between reliability and flexibility for your use case. Building a General-Purpose LLM Agent Step 1: Select the Right LLM Choosing the right model is critical for performance. Evaluate based on: Model Recommendations (as of now): For simpler use cases, smaller models running locally can also be effective, but with limited functionality. Step 2: Define the Agent’s Control Logic The system prompt differentiates an LLM agent from a standalone model. This prompt contains rules, instructions, and structures that guide the agent’s behavior. Common Agentic Patterns: Starting with ReAct or Plan-then-Execute patterns is recommended for general-purpose agents. Step 3: Define the Agent’s Core Instructions To optimize the agent’s behavior, clearly define its features and constraints in the system prompt: Example Instructions: Step 4: Define and Optimize Core Tools Tools expand an agent’s capabilities. Common tools include: For each tool, define: Example: Implementing an Arxiv API tool for scientific queries. Step 5: Memory Handling Strategy Since LLMs have limited memory (context window), a strategy is necessary to manage past interactions. Common approaches include: For personalization, long-term memory can store user preferences or critical information. Step 6: Parse the Agent’s Output To make raw LLM outputs actionable, implement a parser to convert outputs into a structured format like JSON. Structured outputs simplify execution and ensure consistency. Step 7: Orchestrate the Agent’s Workflow Define orchestration logic to handle the agent’s next steps after receiving an output: Example Orchestration Code: pythonCopy codedef orchestrator(llm_agent, llm_output, tools, user_query): while True: action = llm_output.get(“action”) if action == “tool_call”: tool_name = llm_output.get(“tool_name”) tool_params = llm_output.get(“tool_params”, {}) if tool_name in tools: try: tool_result = tools[tool_name](**tool_params) llm_output = llm_agent({“tool_output”: tool_result}) except Exception as e: return f”Error executing tool ‘{tool_name}’: {str(e)}” else: return f”Error: Tool ‘{tool_name}’ not found.” elif action == “return_answer”: return llm_output.get(“answer”, “No answer provided.”) else: return “Error: Unrecognized action type from LLM output.” This orchestration ensures seamless interaction between tools, memory, and user queries. When to Consider Multi-Agent Systems A single-agent setup works well for prototyping but may hit limits with complex workflows or extensive toolsets. Multi-agent architectures can: Starting with a single agent helps refine workflows, identify bottlenecks, and scale effectively. By following these steps, you’ll have a versatile system capable of handling diverse use cases, from competitive analysis to automating workflows. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more Top Ten Reasons Why Tectonic Loves the Cloud The Cloud is Good for Everyone – Why Tectonic loves the cloud You don’t need to worry about tracking licenses. Read more

Read More
MOIRAI-MoE

MOIRAI-MoE

MOIRAI-MoE represents a groundbreaking advancement in time series forecasting by introducing a flexible, data-driven approach that addresses the limitations of traditional models. Its sparse mixture of experts architecture achieves token-level specialization, offering significant performance improvements and computational efficiency. By dynamically adapting to the unique characteristics of time series data, MOIRAI-MoE sets a new standard for foundation models, paving the way for future innovations and expanding the potential of zero-shot forecasting across diverse industries.

Read More
Google’s Gemini 1.5 Flash-8B

Google’s Gemini 1.5 Flash-8B

Google’s Gemini 1.5 Flash-8B: A Game-Changer in Speed and Affordability Google’s latest AI model, Gemini 1.5 Flash-8B, has taken the spotlight as the company’s fastest and most cost-effective offering to date. Building on the foundation of the original Flash model, 8B introduces key upgrades in pricing, speed, and rate limits, signaling Google’s intent to dominate the affordable AI model market. What Sets Gemini 1.5 Flash-8B Apart? Google has implemented several enhancements to this lightweight model, informed by “developer feedback and testing the limits of what’s possible,” as highlighted in their announcement. These updates focus on three major areas: 1. Unprecedented Price Reduction The cost of using Flash-8B has been slashed in half compared to its predecessor, making it the most budget-friendly model in its class. This dramatic price drop solidifies Flash-8B as a leading choice for developers seeking an affordable yet reliable AI solution. 2. Enhanced Speed The Flash-8B model is 40% faster than its closest competitor, GPT-4o, according to data from Artificial Analysis. This improvement underscores Google’s focus on speed as a critical feature for developers. Whether working in AI Studio or using the Gemini API, users will notice shorter response times and smoother interactions. 3. Increased Rate Limits Flash-8B doubles the rate limits of its predecessor, allowing for 4,000 requests per minute. This improvement ensures developers and users can handle higher volumes of smaller, faster tasks without bottlenecks, enhancing efficiency in real-time applications. Accessing Flash-8B You can start using Flash-8B today through Google AI Studio or via the Gemini API. AI Studio provides a free testing environment, making it a great starting point before transitioning to API integration for larger-scale projects. Comparing Flash-8B to Other Gemini Models Flash-8B positions itself as a faster, cheaper alternative to high-performance models like Gemini 1.5 Pro. While it doesn’t outperform the Pro model across all benchmarks, it excels in cost efficiency and speed, making it ideal for tasks requiring rapid processing at scale. In benchmark evaluations, Flash-8B surpasses the base Flash model in four key areas, with only marginal decreases in other metrics. For developers prioritizing speed and affordability, Flash-8B offers a compelling balance between performance and cost. Why Flash-8B Matters Gemini 1.5 Flash-8B highlights Google’s commitment to providing accessible AI solutions for developers without compromising on quality. With its reduced costs, faster response times, and higher request limits, Flash-8B is poised to redefine expectations for lightweight AI models, catering to a broad spectrum of applications while maintaining an edge in affordability. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Where LLMs Fall Short

LLM Economies

Throughout history, disruptive technologies have been the catalyst for major social and economic revolutions. The invention of the plow and irrigation systems 12,000 years ago sparked the Agricultural Revolution, while Johannes Gutenberg’s 15th-century printing press fueled the Protestant Reformation and helped propel Europe out of the Middle Ages into the Renaissance. In the 18th century, James Watt’s steam engine ushered in the Industrial Revolution. More recently, the internet has revolutionized communication, commerce, and information access, shrinking the world into a global village. Similarly, smartphones have transformed how people interact with their surroundings. Now, we stand at the dawn of the AI revolution. Large Language Models (LLMs) represent a monumental leap forward, with significant economic implications at both macro and micro levels. These models are reshaping global markets, driving new forms of currency, and creating a novel economic landscape. The reason LLMs are transforming industries and redefining economies is simple: they automate both routine and complex tasks that traditionally require human intelligence. They enhance decision-making processes, boost productivity, and facilitate cost reductions across various sectors. This enables organizations to allocate human resources toward more creative and strategic endeavors, resulting in the development of new products and services. From healthcare to finance to customer service, LLMs are creating new markets and driving AI-driven services like content generation and conversational assistants into the mainstream. To truly grasp the engine driving this new global economy, it’s essential to understand the inner workings of this disruptive technology. These posts will provide both a macro-level overview of the economic forces at play and a deep dive into the technical mechanics of LLMs, equipping you with a comprehensive understanding of the revolution happening now. Why Now? The Connection Between Language and Human Intelligence AI did not begin with ChatGPT’s arrival in November 2022. Many people were developing machine learning classification models in 1999, and the roots of AI go back even further. Artificial Intelligence was formally born in 1950, when Alan Turing—considered the father of theoretical computer science and famed for cracking the Nazi Enigma code during World War II—created the first formal definition of intelligence. This definition, known as the Turing Test, demonstrated the potential for machines to exhibit human-like intelligence through natural language conversations. The test involves a human evaluator who engages in conversations with both a human and a machine. If the evaluator cannot reliably distinguish between the two, the machine is considered to have passed the test. Remarkably, after 72 years of gradual AI development, ChatGPT simulated this very interaction, passing the Turing Test and igniting the current AI explosion. But why is language so closely tied to human intelligence, rather than, for example, vision? While 70% of our brain’s neurons are devoted to vision, OpenAI’s pioneering image generation model, DALL-E, did not trigger the same level of excitement as ChatGPT. The answer lies in the profound role language has played in human evolution. The Evolution of Language The development of language was the turning point in humanity’s rise to dominance on Earth. As Yuval Noah Harari points out in his book Sapiens: A Brief History of Humankind, it was the ability to gossip and discuss abstract concepts that set humans apart from other species. Complex communication, such as gossip, requires a shared, sophisticated language. Human language evolved from primitive cave signs to structured alphabets, which, along with grammar rules, created languages capable of expressing thousands of words. In today’s digital age, language has further evolved with the inclusion of emojis, and now with the advent of GenAI, tokens have become the latest cornerstone in this progression. These shifts highlight the extraordinary journey of human language, from simple symbols to intricate digital representations. In the next post, we will explore the intricacies of LLMs, focusing specifically on tokens. But before that, let’s delve into the economic forces shaping the LLM-driven world. The Forces Shaping the LLM Economy AI Giants in Competition Karl Marx and Friedrich Engels argued that those who control the means of production hold power. The tech giants of today understand that AI is the future means of production, and the race to dominate the LLM market is well underway. This competition is fierce, with industry leaders like OpenAI, Google, Microsoft, and Facebook battling for supremacy. New challengers such as Mistral (France), AI21 (Israel), and Elon Musk’s xAI and Anthropic are also entering the fray. The LLM industry is expanding exponentially, with billions of dollars of investment pouring in. For example, Anthropic has raised $4.5 billion from 43 investors, including major players like Amazon, Google, and Microsoft. The Scarcity of GPUs Just as Bitcoin mining requires vast computational resources, training LLMs demands immense computing power, driving a search for new energy sources. Microsoft’s recent investment in nuclear energy underscores this urgency. At the heart of LLM technology are Graphics Processing Units (GPUs), essential for powering deep neural networks. These GPUs have become scarce and expensive, adding to the competitive tension. Tokens: The New Currency of the LLM Economy Tokens are the currency driving the emerging AI economy. Just as money facilitates transactions in traditional markets, tokens are the foundation of LLM economics. But what exactly are tokens? Tokens are the basic units of text that LLMs process. They can be single characters, parts of words, or entire words. For example, the word “Oscar” might be split into two tokens, “os” and “car.” The performance of LLMs—quality, speed, and cost—hinges on how efficiently they generate these tokens. LLM providers price their services based on token usage, with different rates for input (prompt) and output (completion) tokens. As companies rely more on LLMs, especially for complex tasks like agentic applications, token usage will significantly impact operational costs. With fierce competition and the rise of open-source models like Llama-3.1, the cost of tokens is rapidly decreasing. For instance, OpenAI reduced its GPT-4 pricing by about 80% over the past year and a half. This trend enables companies to expand their portfolio of AI-powered products, further fueling the LLM economy. Context Windows: Expanding Capabilities

Read More
Snowflake Security and Development

Snowflake Security and Development

Snowflake Unveils AI Development and Enhanced Security Features At its annual Build virtual developer conference, Snowflake introduced a suite of new capabilities focused on AI development and strengthened security measures. These enhancements aim to simplify the creation of conversational AI tools, improve collaboration, and address data security challenges following a significant breach earlier this year. AI Development Updates Snowflake announced updates to its Cortex AI suite to streamline the development of conversational AI applications. These new tools focus on enabling faster, more efficient development while ensuring data integrity and trust. Highlights include: These features address enterprise demands for generative AI tools that boost productivity while maintaining governance over proprietary data. Snowflake aims to eliminate barriers to data-driven decision-making by enabling natural language queries and easy integration of structured and unstructured data into AI models. According to Christian Kleinerman, Snowflake’s EVP of Product, the goal is to reduce the time it takes for developers to build reliable, cost-effective AI applications: “We want to help customers build conversational applications for structured and unstructured data faster and more efficiently.” Security Enhancements Following a breach last May, where hackers accessed customer data via stolen login credentials, Snowflake has implemented new security features: These additions come alongside existing tools like the Horizon Catalog for data governance. Kleinerman noted that while Snowflake’s previous security measures were effective at preventing unauthorized access, the company recognizes the need to improve user adoption of these tools: “It’s on us to ensure our customers can fully leverage the security capabilities we offer. That’s why we’re adding more monitoring, insights, and recommendations.” Collaboration Features Snowflake is also enhancing collaboration through its new Internal Marketplace, which enables organizations to share data, AI tools, and applications across business units. The Native App Framework now integrates with Snowpark Container Services to simplify the distribution and monetization of analytics and AI products. AI Governance and Competitive Position Industry analysts highlight the growing importance of AI governance as enterprises increasingly adopt generative AI tools. David Menninger of ISG’s Ventana Research emphasized that Snowflake’s governance-focused features, such as LLM observability, fill a critical gap in AI tooling: “Trustworthy AI enhancements like model explainability and observability are vital as enterprises scale their use of AI.” With these updates, Snowflake continues to compete with Databricks and other vendors. Its strategy focuses on offering both API-based flexibility for developers and built-in tools for users seeking simpler solutions. By combining innovative AI development tools with robust security and collaboration features, Snowflake aims to meet the evolving needs of enterprises while positioning itself as a leader in the data platform and AI space. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more Top Ten Reasons Why Tectonic Loves the Cloud The Cloud is Good for Everyone – Why Tectonic loves the cloud You don’t need to worry about tracking licenses. Read more

Read More
Where LLMs Fall Short

Where LLMs Fall Short

Large Language Models (LLMs) have transformed natural language processing, showcasing exceptional abilities in text generation, translation, and various language tasks. Models like GPT-4, BERT, and T5 are based on transformer architectures, which enable them to predict the next word in a sequence by training on vast text datasets. How LLMs Function LLMs process input text through multiple layers of attention mechanisms, capturing complex relationships between words and phrases. Here’s an overview of the process: Tokenization and Embedding Initially, the input text is broken down into smaller units, typically words or subwords, through tokenization. Each token is then converted into a numerical representation known as an embedding. For instance, the sentence “The cat sat on the mat” could be tokenized into [“The”, “cat”, “sat”, “on”, “the”, “mat”], each assigned a unique vector. Multi-Layer Processing The embedded tokens are passed through multiple transformer layers, each containing self-attention mechanisms and feed-forward neural networks. Contextual Understanding As the input progresses through layers, the model develops a deeper understanding of the text, capturing both local and global context. This enables the model to comprehend relationships such as: Training and Pattern Recognition During training, LLMs are exposed to vast datasets, learning patterns related to grammar, syntax, and semantics: Generating Responses When generating text, the LLM predicts the next word or token based on its learned patterns. This process is iterative, where each generated token influences the next. For example, if prompted with “The Eiffel Tower is located in,” the model would likely generate “Paris,” given its learned associations between these terms. Limitations in Reasoning and Planning Despite their capabilities, LLMs face challenges in areas like reasoning and planning. Research by Subbarao Kambhampati highlights several limitations: Lack of Causal Understanding LLMs struggle with causal reasoning, which is crucial for understanding how events and actions relate in the real world. Difficulty with Multi-Step Planning LLMs often struggle to break down tasks into a logical sequence of actions. Blocksworld Problem Kambhampati’s research on the Blocksworld problem, which involves stacking and unstacking blocks, shows that LLMs like GPT-3 struggle with even simple planning tasks. When tested on 600 Blocksworld instances, GPT-3 solved only 12.5% of them using natural language prompts. Even after fine-tuning, the model solved only 20% of the instances, highlighting the model’s reliance on pattern recognition rather than true understanding of the planning task. Performance on GPT-4 Temporal and Counterfactual Reasoning LLMs also struggle with temporal reasoning (e.g., understanding the sequence of events) and counterfactual reasoning (e.g., constructing hypothetical scenarios). Token and Numerical Errors LLMs also exhibit errors in numerical reasoning due to inconsistencies in tokenization and their lack of true numerical understanding. Tokenization and Numerical Representation Numbers are often tokenized inconsistently. For example, “380” might be one token, while “381” might split into two tokens (“38” and “1”), leading to confusion in numerical interpretation. Decimal Comparison Errors LLMs can struggle with decimal comparisons. For example, comparing 9.9 and 9.11 may result in incorrect conclusions due to how the model processes these numbers as strings rather than numerically. Examples of Numerical Errors Hallucinations and Biases Hallucinations LLMs are prone to generating false or nonsensical content, known as hallucinations. This can happen when the model produces irrelevant or fabricated information. Biases LLMs can perpetuate biases present in their training data, which can lead to the generation of biased or stereotypical content. Inconsistencies and Context Drift LLMs often struggle to maintain consistency over long sequences of text or tasks. As the input grows, the model may prioritize more recent information, leading to contradictions or neglect of earlier context. This is particularly problematic in multi-turn conversations or tasks requiring persistence. Conclusion While LLMs have advanced the field of natural language processing, they still face significant challenges in reasoning, planning, and maintaining contextual accuracy. These limitations highlight the need for further research and development of hybrid AI systems that integrate LLMs with other techniques to improve reasoning, consistency, and overall performance. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more Top Ten Reasons Why Tectonic Loves the Cloud The Cloud is Good for Everyone – Why Tectonic loves the cloud You don’t need to worry about tracking licenses. Read more

Read More
LLMs and AI

LLMs and AI

Large Language Models (LLMs): Revolutionizing AI and Custom Solutions Large Language Models (LLMs) are transforming artificial intelligence by enabling machines to generate and comprehend human-like text, making them indispensable across numerous industries. The global LLM market is experiencing explosive growth, projected to rise from $1.59 billion in 2023 to $259.8 billion by 2030. This surge is driven by the increasing demand for automated content creation, advances in AI technology, and the need for improved human-machine communication. Several factors are propelling this growth, including advancements in AI and Natural Language Processing (NLP), large datasets, and the rising importance of seamless human-machine interaction. Additionally, private LLMs are gaining traction as businesses seek more control over their data and customization. These private models provide tailored solutions, reduce dependency on third-party providers, and enhance data privacy. This guide will walk you through building your own private LLM, offering valuable insights for both newcomers and seasoned professionals. What are Large Language Models? Large Language Models (LLMs) are advanced AI systems that generate human-like text by processing vast amounts of data using sophisticated neural networks, such as transformers. These models excel in tasks such as content creation, language translation, question answering, and conversation, making them valuable across industries, from customer service to data analysis. LLMs are generally classified into three types: LLMs learn language rules by analyzing vast text datasets, similar to how reading numerous books helps someone understand a language. Once trained, these models can generate content, answer questions, and engage in meaningful conversations. For example, an LLM can write a story about a space mission based on knowledge gained from reading space adventure stories, or it can explain photosynthesis using information drawn from biology texts. Building a Private LLM Data Curation for LLMs Recent LLMs, such as Llama 3 and GPT-4, are trained on massive datasets—Llama 3 on 15 trillion tokens and GPT-4 on 6.5 trillion tokens. These datasets are drawn from diverse sources, including social media (140 trillion tokens), academic texts, and private data, with sizes ranging from hundreds of terabytes to multiple petabytes. This breadth of training enables LLMs to develop a deep understanding of language, covering diverse patterns, vocabularies, and contexts. Common data sources for LLMs include: Data Preprocessing After data collection, the data must be cleaned and structured. Key steps include: LLM Training Loop Key training stages include: Evaluating Your LLM After training, it is crucial to assess the LLM’s performance using industry-standard benchmarks: When fine-tuning LLMs for specific applications, tailor your evaluation metrics to the task. For instance, in healthcare, matching disease descriptions with appropriate codes may be a top priority. Conclusion Building a private LLM provides unmatched customization, enhanced data privacy, and optimized performance. From data curation to model evaluation, this guide has outlined the essential steps to create an LLM tailored to your specific needs. Whether you’re just starting or seeking to refine your skills, building a private LLM can empower your organization with state-of-the-art AI capabilities. For expert guidance or to kickstart your LLM journey, feel free to contact us for a free consultation. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Generative AI Energy Consumption Rises

Generative AI Energy Consumption Rises

Generative AI Energy Consumption Rises, but Impact on ROI Unclear The energy costs associated with generative AI (GenAI) are often overlooked in enterprise financial planning. However, industry experts suggest that IT leaders should account for the power consumption that comes with adopting this technology. When building a business case for generative AI, some costs are evident, like large language model (LLM) fees and SaaS subscriptions. Other costs, such as preparing data, upgrading cloud infrastructure, and managing organizational changes, are less visible but significant. Generative AI Energy Consumption Rises One often overlooked cost is the energy consumption of generative AI. Training LLMs and responding to user requests—whether answering questions or generating images—demands considerable computing power. These tasks generate heat and necessitate sophisticated cooling systems in data centers, which, in turn, consume additional energy. Despite this, most enterprises have not focused on the energy requirements of GenAI. However, the issue is gaining more attention at a broader level. The International Energy Agency (IEA), for instance, has forecasted that electricity consumption from data centers, AI, and cryptocurrency could double by 2026. By that time, data centers’ electricity use could exceed 1,000 terawatt-hours, equivalent to Japan’s total electricity consumption. Goldman Sachs also flagged the growing energy demand, attributing it partly to AI. The firm projects that global data center electricity use could more than double by 2030, fueled by AI and other factors. ROI Implications of Energy Costs The extent to which rising energy consumption will affect GenAI’s return on investment (ROI) remains unclear. For now, the perceived benefits of GenAI seem to outweigh concerns about energy costs. Most businesses have not been directly impacted, as these costs tend to affect hyperscalers more. For instance, Google reported a 13% increase in greenhouse gas emissions in 2023, largely due to AI-related energy demands in its data centers. Scott Likens, PwC’s global chief AI engineering officer, noted that while energy consumption isn’t a barrier to adoption, it should still be factored into long-term strategies. “You don’t take it for granted. There’s a cost somewhere for the enterprise,” he said. Energy Costs: Hidden but Present Although energy expenses may not appear on an enterprise’s invoice, they are still present. Generative AI’s energy consumption is tied to both model training and inference—each time a user makes a query, the system expends energy to generate a response. While the energy used for individual queries is minor, the cumulative effect across millions of users can add up. How these costs are passed to customers is somewhat opaque. Licensing fees for enterprise versions of GenAI products likely include energy costs, spread across the user base. According to PwC’s Likens, the costs associated with training models are shared among many users, reducing the burden on individual enterprises. On the inference side, GenAI vendors charge for tokens, which correspond to computational power. Although increased token usage signals higher energy consumption, the financial impact on enterprises has so far been minimal, especially as token costs have decreased. This may be similar to buying an EV to save on gas but spending hundreds and losing hours at charging stations. Energy as an Indirect Concern While energy costs haven’t been top-of-mind for GenAI adopters, they could indirectly address the issue by focusing on other deployment challenges, such as reducing latency and improving cost efficiency. Newer models, such as OpenAI’s GPT-4o mini, are more economical and have helped organizations scale GenAI without prohibitive costs. Organizations may also use smaller, fine-tuned models to decrease latency and energy consumption. By adopting multimodel approaches, enterprises can choose models based on the complexity of a task, optimizing for both speed and energy efficiency. The Data Center Dilemma As enterprises consider GenAI’s energy demands, data centers face the challenge head-on, investing in more sophisticated cooling systems to handle the heat generated by AI workloads. According to the Dell’Oro Group, the data center physical infrastructure market grew in the second quarter of 2024, signaling the start of the “AI growth cycle” for infrastructure sales, particularly thermal management systems. Liquid cooling, more efficient than air cooling, is gaining traction as a way to manage the heat from high-performance computing. This method is expected to see rapid growth in the coming years as demand for AI workloads continues to increase. Nuclear Power and AI Energy Demands To meet AI’s growing energy demands, some hyperscalers are exploring nuclear energy for their data centers. AWS, Google, and Microsoft are among the companies exploring this option, with AWS acquiring a nuclear-powered data center campus earlier this year. Nuclear power could help these tech giants keep pace with AI’s energy requirements while also meeting sustainability goals. I don’t know. It seems like if you akin AI accessibility to more nuclear power plants you would lose a lot of fans. As GenAI continues to evolve, both energy costs and efficiency are likely to play a greater role in decision-making. PwC has already begun including carbon impact as part of its GenAI value framework, which assesses the full scope of generative AI deployments. “The cost of carbon is in there, so we shouldn’t ignore it,” Likens said. Generative AI Energy Consumption Rises Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Recent advancements in AI

Recent advancements in AI

Recent advancements in AI have been propelled by large language models (LLMs) containing billions to trillions of parameters. Parameters—variables used to train and fine-tune machine learning models—have played a key role in the development of generative AI. As the number of parameters grows, models like ChatGPT can generate human-like content that was unimaginable just a few years ago. Parameters are sometimes referred to as “features” or “feature counts.” While it’s tempting to equate the power of AI models with their parameter count, similar to how we think of horsepower in cars, more parameters aren’t always better. An increase in parameters can lead to additional computational overhead and even problems like overfitting. There are various ways to increase the number of parameters in AI models, but not all approaches yield the same improvements. For example, Google’s Switch Transformers scaled to trillions of parameters, but some of their smaller models outperformed them in certain use cases. Thus, other metrics should be considered when evaluating AI models. The exact relationship between parameter count and intelligence is still debated. John Blankenbaker, principal data scientist at SSA & Company, notes that larger models tend to replicate their training data more accurately, but the belief that more parameters inherently lead to greater intelligence is often wishful thinking. He points out that while these models may sound knowledgeable, they don’t actually possess true understanding. One challenge is the misunderstanding of what a parameter is. It’s not a word, feature, or unit of data but rather a component within the model‘s computation. Each parameter adjusts how the model processes inputs, much like turning a knob in a complex machine. In contrast to parameters in simpler models like linear regression, which have a clear interpretation, parameters in LLMs are opaque and offer no insight on their own. Christine Livingston, managing director at Protiviti, explains that parameters act as weights that allow flexibility in the model. However, more parameters can lead to overfitting, where the model performs well on training data but struggles with new information. Adnan Masood, chief AI architect at UST, highlights that parameters influence precision, accuracy, and data management needs. However, due to the size of LLMs, it’s impractical to focus on individual parameters. Instead, developers assess models based on their intended purpose, performance metrics, and ethical considerations. Understanding the data sources and pre-processing steps becomes critical in evaluating the model’s transparency. It’s important to differentiate between parameters, tokens, and words. A parameter is not a word; rather, it’s a value learned during training. Tokens are fragments of words, and LLMs are trained on these tokens, which are transformed into embeddings used by the model. The number of parameters influences a model’s complexity and capacity to learn. More parameters often lead to better performance, but they also increase computational demands. Larger models can be harder to train and operate, leading to slower response times and higher costs. In some cases, smaller models are preferred for domain-specific tasks because they generalize better and are easier to fine-tune. Transformer-based models like GPT-4 dwarf previous generations in parameter count. However, for edge-based applications where resources are limited, smaller models are preferred as they are more adaptable and efficient. Fine-tuning large models for specific domains remains a challenge, often requiring extensive oversight to avoid problems like overfitting. There is also growing recognition that parameter count alone is not the best way to measure a model’s performance. Alternatives like Stanford’s HELM and benchmarks such as GLUE and SuperGLUE assess models across multiple factors, including fairness, efficiency, and bias. Three trends are shaping how we think about parameters. First, AI developers are improving model performance without necessarily increasing parameters. A study of 231 models between 2012 and 2023 found that the computational power required for LLMs has halved every eight months, outpacing Moore’s Law. Second, new neural network approaches like Kolmogorov-Arnold Networks (KANs) show promise, achieving comparable results to traditional models with far fewer parameters. Lastly, agentic AI frameworks like Salesforce’s Agentforce offer a new architecture where domain-specific AI agents can outperform larger general-purpose models. As AI continues to evolve, it’s clear that while parameter count is an important consideration, it’s just one of many factors in evaluating a model’s overall capabilities. To stay on the cutting edge of artificial intelligence, contact Tectonic today. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Salesforce LlamaRank

Salesforce LlamaRank

Document ranking remains a critical challenge in information retrieval and natural language processing. Effective document retrieval and ranking are crucial for enhancing the performance of search engines, question-answering systems, and Retrieval-Augmented Generation (RAG) systems. Traditional ranking models often struggle to balance result precision with computational efficiency, especially when dealing with large datasets and diverse query types. This challenge underscores the growing need for advanced models that can provide accurate, contextually relevant results in real-time from continuous data streams and increasingly complex queries. Salesforce AI Research has introduced a cutting-edge reranker named LlamaRank, designed to significantly enhance document ranking and code search tasks across various datasets. Built on the Llama3-8B-Instruct architecture, LlamaRank integrates advanced linear and calibrated scoring mechanisms, achieving both speed and interpretability. The Salesforce AI Research team developed LlamaRank as a specialized tool for document relevancy ranking. Enhanced by iterative feedback from their dedicated RLHF data annotation team, LlamaRank outperforms many leading APIs in general document ranking and sets a new standard for code search performance. The model’s training data includes high-quality synthesized data from Llama3-70B and Llama3-405B, along with human-labeled annotations, covering a broad range of domains from topic-based search and document QA to code QA. In RAG systems, LlamaRank plays a crucial role. Initially, a query is processed using a less precise but cost-effective method, such as semantic search with embeddings, to generate a list of potential documents. The reranker then refines this list to identify the most relevant documents, ensuring that the language model is fine-tuned with only the most pertinent information, thereby improving accuracy and coherence in the output responses. LlamaRank’s architecture, based on Llama3-8B-Instruct, leverages a diverse training corpus of synthetic and human-labeled data. This extensive dataset enables LlamaRank to excel in various tasks, from general document retrieval to specialized code searches. The model underwent multiple feedback cycles from Salesforce’s data annotation team to achieve optimal accuracy and relevance in its scoring predictions. During inference, LlamaRank predicts token probabilities and calculates a numeric relevance score, facilitating efficient reranking. Demonstrated on several public datasets, LlamaRank has shown impressive performance. For instance, on the SQuAD dataset for question answering, LlamaRank achieved a hit rate of 99.3%. It posted a hit rate of 92.0% on the TriviaQA dataset. In code search benchmarks, LlamaRank recorded a hit rate of 81.8% on the Neural Code Search dataset and 98.6% on the TrailheadQA dataset. These results highlight LlamaRank’s versatility and efficiency across various document types and query scenarios. LlamaRank’s technical specifications further emphasize its advantages. Supporting up to 8,000 tokens per document, it significantly outperforms competitors like Cohere’s reranker. It delivers low-latency performance, ranking 64 documents in under 200 ms with a single H100 GPU, compared to approximately 3.13 seconds on Cohere’s serverless API. Additionally, LlamaRank features linear scoring calibration, offering clear and interpretable relevance scores. While LlamaRank’s size of 8 billion parameters contributes to its high performance, it is approaching the upper limits of reranking model size. Future research may focus on optimizing model size to balance quality and efficiency. Overall, LlamaRank from Salesforce AI Research marks a significant advancement in reranking technology, promising to greatly enhance RAG systems’ effectiveness across a wide range of applications. With its powerful performance, efficiency, and clear scoring, LlamaRank represents a major step forward in document retrieval and search accuracy. The community eagerly anticipates its broader adoption and further development. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Ingest Salesforce Data to Microsoft Fabric

Ingest Salesforce Data to Microsoft Fabric

I’m using Dataflow Gen 2 in Microsoft Fabric to ingest data from Salesforce via the Salesforce Objects connector, which is authenticated through an Organizational Account (OAuth 2.0). However, unlike Azure Synapse’s SalesforceV2 type, this connector doesn’t offer fields to input a client ID, client secret, or environment URL. Here are the key concerns: 1. Reauthentication Requirement Will reauthentication be required regularly (e.g., after access tokens expire), and how often will that occur? What factors contribute to the frequency of reauthentication? With OAuth 2.0, the system typically provides an access token (short-lived, often around 1 hour) and a refresh token, which can last longer. Reauthentication is necessary when both expire. While Dataflow Gen 2 does not allow manual token management, it should handle refreshing access tokens automatically. The reauthentication frequency depends largely on: 2. Cons of Using an Organizational Account What are the potential downsides of using an Organizational Account for this connection, particularly in a production setting where automation and stability are critical? Potential drawbacks: To mitigate these risks, I recommend using a service account (rather than individual accounts) to centralize and secure access. 3. Workaround for Client Credentials Flow Is it possible to implement a client credentials flow (i.e., providing a client ID, client secret, and environment URL) to prevent frequent reauthentication, similar to Azure Synapse or Data Factory? If not, what options are available for maintaining a stable, long-term data connection from Salesforce? Currently, there doesn’t appear to be support for client credentials flow in Dataflow Gen 2. You may want to reach out to Microsoft support for confirmation. As an alternative, you could explore: Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
  • 1
  • 2
gettectonic.com