Tokens Archives - gettectonic.com - Page 2
AI Confidence Scores

AI Confidence Scores

In this insight, the focus is on exploring the use of confidence scores available through the OpenAI API. The first section delves into these scores and explains their significance using a custom chat interface. The second section demonstrates how to apply confidence scores programmatically in code. Understanding Confidence Scores To begin, it’s important to understand what an LLM (Large Language Model) is doing for each token in its response: However, it’s essential to clarify that the term “probabilities” here is somewhat misleading. While mathematically, they qualify as a “probability distribution” (the values add up to one), they don’t necessarily reflect true confidence or likelihood in the way we might expect. In this sense, these values should be treated with caution. A useful way to think about these values is to consider them as “confidence” scores, though it’s crucial to remember that, much like humans, LLMs can be confident and still be wrong. The values themselves are not inherently meaningful without additional context or validation. Example: Using a Chat Interface An example of exploring these confidence scores can be seen in a chat interface where: In one case, when asked to “pick a number,” the LLM chose the word “choose” despite it having only a 21% chance of being selected. This demonstrates that LLMs don’t always pick the most likely token unless configured to do so. Additionally, this interface shows how the model might struggle with questions that have no clear answer, offering insights into detecting possible hallucinations. For example, when asked to list famous people with an interpunct in their name, the model shows low confidence in its guesses. This behavior indicates uncertainty and can be an indicator of a forthcoming incorrect response. Hallucinations and Confidence Scores The discussion also touches on the question of whether low confidence scores can help detect hallucinations—cases where the model generates false information. While low confidence often correlates with potential hallucinations, it’s not a foolproof indicator. Some hallucinations may come with high confidence, while low-confidence tokens might simply reflect natural variability in language. For instance, when asked about the capital of Kazakhstan, the model shows uncertainty due to the historical changes between Astana and Nur-Sultan. The confidence scores reflect this inconsistency, highlighting how the model can still select an answer despite having conflicting information. Using Confidence Scores in Code The next part of the discussion covers how to leverage confidence scores programmatically. For simple yes/no questions, it’s possible to compress the response into a single token and calculate the confidence score using OpenAI’s API. Key API settings include: Using this setup, one can extract the model’s confidence in its response, converting log probabilities back into regular probabilities using math.exp. Expanding to Real-World Applications The post extends this concept to more complex scenarios, such as verifying whether an image of a driver’s license is valid. By analyzing the model’s confidence in its answer, developers can determine when to flag responses for human review based on predefined confidence thresholds. This technique can also be applied to multiple-choice questions, allowing developers to extract not only the top token but also the top 10 options, along with their confidence scores. Conclusion While confidence scores from LLMs aren’t a perfect solution for detecting accuracy or truthfulness, they can provide useful insights in certain scenarios. With careful application and evaluation, developers can make informed decisions about when to trust the model’s responses and when to intervene. The final takeaway is that confidence scores, while not foolproof, can play a role in improving the reliability of LLM outputs—especially when combined with thoughtful design and ongoing calibration. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Gen AI and Test Automation

Gen AI and Test Automation

Generative AI has brought transformative advancements across industries, and test automation is no exception. By generating code, test scenarios, and even entire suites, Generative AI enables Software Development Engineers in Test (SDETs) to boost efficiency, expand test coverage, and improve reliability. 1. Enhanced Test Case Generation One of the biggest hurdles in test automation is generating diverse, comprehensive test cases. Traditional methods often miss edge cases or diverse scenarios. Generative AI, however, can analyze existing data and automatically generate extensive test cases, including potential edge cases that may not be apparent to human testers. Example: An SDET can use Generative AI to create test cases for a web application by feeding it requirements and user data. This enables the AI to produce hundreds of test cases, capturing diverse user behaviors and interactions that manual testers may overlook. pythonCopy codeimport openai openai.api_key = ‘YOUR_API_KEY’ def generate_test_cases(application_description): response = openai.Completion.create( engine=”text-davinci-003″, prompt=f”Generate comprehensive test cases for the following application: {application_description}”, max_tokens=500 ) return response.choices[0].text app_description = “An e-commerce platform for browsing products, adding to cart, and checking out.” test_cases = generate_test_cases(app_description) print(test_cases) Sample Output: 2. Intelligent Test Script Creation Writing test scripts manually can be labor-intensive and error-prone. Generative AI can simplify this by generating test scripts based on an application’s flow, ensuring consistency and precision. Example: If an SDET needs to automate tests for a mobile app, they can use Generative AI to generate scripts for various scenarios, significantly reducing manual work. pythonCopy codeimport hypothetical_ai_test_tool ui_description = “”” Login Page: – Username field – Password field – Login button Home Page: – Search bar – Product listings – Add to cart buttons “”” test_scripts = hypothetical_ai_test_tool.generate_selenium_scripts(ui_description) Sample Output for test_login.py: pythonCopy codefrom selenium import webdriver from selenium.webdriver.common.keys import Keys def test_login(): driver = webdriver.Chrome() driver.get(“http://example.com/login”) username_field = driver.find_element_by_name(“username”) password_field = driver.find_element_by_name(“password”) login_button = driver.find_element_by_name(“login”) username_field.send_keys(“testuser”) password_field.send_keys(“password”) login_button.click() assert “Home” in driver.title driver.quit() 3. Automated Maintenance of Test Suites As applications evolve, maintaining test suites is critical. Generative AI can monitor app changes and update test cases automatically, keeping test suites accurate and relevant. Example: In a CI/CD pipeline, an SDET can deploy Generative AI to track code changes and update affected test scripts. This minimizes downtime and ensures tests stay aligned with application updates. pythonCopy codeimport hypothetical_ai_maintenance_tool def maintain_test_suite(): changes = hypothetical_ai_maintenance_tool.analyze_code_changes() updated_scripts = hypothetical_ai_maintenance_tool.update_test_scripts(changes) for script_name, script_content in updated_scripts.items(): with open(script_name, ‘w’) as file: file.write(script_content) maintain_test_suite() Sample Output:“Updating test_login.py with new login flow changes… Test scripts updated successfully.” 4. Natural Language Processing for Test Case Design Generative AI with NLP can interpret human language, enabling SDETs to create test cases from plain-language descriptions, enhancing collaboration across technical and non-technical teams. Example: An SDET can use an NLP-powered tool to translate a feature description from a product manager into test cases. This speeds up the process and ensures that test cases reflect intended functionality. pythonCopy codeimport openai openai.api_key = ‘YOUR_API_KEY’ def create_test_cases(description): response = openai.Completion.create( engine=”text-davinci-003″, prompt=f”Create test cases based on this feature description: {description}”, max_tokens=500 ) return response.choices[0].text feature_description = “Allow users to reset passwords via email to regain account access.” test_cases = create_test_cases(feature_description) print(test_cases) Sample Output: 5. Predictive Analytics for Test Prioritization Generative AI can analyze historical data to prioritize high-risk areas, allowing SDETs to focus testing on critical functionalities. Example: An SDET can use predictive analytics to identify areas with frequent bugs, allocating resources more effectively and ensuring robust testing of high-risk components. pythonCopy codeimport hypothetical_ai_predictive_tool def prioritize_tests(): risk_areas = hypothetical_ai_predictive_tool.predict_risk_areas() prioritized_tests = hypothetical_ai_predictive_tool.prioritize_test_cases(risk_areas) return prioritized_tests prioritized_test_cases = prioritize_tests() print(“Prioritized Test Cases:”) for test in prioritized_test_cases: print(test) Sample Output: Gen AI and Test Automation Generative AI has the potential to revolutionize test automation, offering SDETs tools to enhance efficiency, coverage, and reliability. By embracing Generative AI for tasks like test case generation, script creation, suite maintenance, NLP-based design, and predictive prioritization, SDETs can reduce manual effort and focus on strategic tasks, accelerating testing processes and ensuring robust, reliable software systems. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more The Salesforce Story In Marc Benioff’s own words How did salesforce.com grow from a start up in a rented apartment into the world’s Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Read More
Communicating With Machines

Communicating With Machines

For as long as machines have existed, humans have struggled to communicate effectively with them. The rise of large language models (LLMs) has transformed this dynamic, making “prompting” the bridge between our intentions and AI’s actions. By providing pre-trained models with clear instructions and context, we can ensure they understand and respond correctly. As UX practitioners, we now play a key role in facilitating this interaction, helping humans and machines truly connect. The UX discipline was born alongside graphical user interfaces (GUIs), offering a way for the average person to interact with computers without needing to write code. We introduced familiar concepts like desktops, trash cans, and save icons to align with users’ mental models, while complex code ran behind the scenes. Now, with the power of AI and the transformer architecture, a new form of interaction has emerged—natural language communication. This shift has changed the design landscape, moving us from pure graphical interfaces to an era where text-based interactions dominate. As designers, we must reconsider where our focus should lie in this evolving environment. A Mental Shift In the era of command-based design, we focused on breaking down complex user problems, mapping out customer journeys, and creating deterministic flows. Now, with AI at the forefront, our challenge is to provide models with the right context for optimal output and refine the responses through iteration. Shifting Complexity to the Edges Successful communication, whether with a person or a machine, hinges on context. Just as you would clearly explain your needs to a salesperson to get the right product, AI models also need clear instructions. Expecting users to input all the necessary information in their prompts won’t lead to widespread adoption of these models. Here, UX practitioners play a critical role. We can design user experiences that integrate context—some visible to users, others hidden—shaping how AI interacts with them. This ensures that users can seamlessly communicate with machines without the burden of detailed, manual prompts. The Craft of Prompting As designers, our role in crafting prompts falls into three main areas: Even if your team isn’t building custom models, there’s still plenty of work to be done. You can help select pre-trained models that align with user goals and design a seamless experience around them. Understanding the Context Window A key concept for UX designers to understand is the “context window“—the information a model can process to generate an output. Think of it as the amount of memory the model retains during a conversation. Companies can use this to include hidden prompts, helping guide AI responses to align with brand values and user intent. Context windows are measured in tokens, not time, so even if you return to a conversation weeks later, the model remembers previous interactions, provided they fit within the token limit. With innovations like Gemini’s 2-million-token context window, AI models are moving toward infinite memory, which will bring new design challenges for UX practitioners. How to Approach Prompting Prompting is an iterative process where you craft an instruction, test it with the model, and refine it based on the results. Some effective techniques include: Depending on the scenario, you’ll either use direct, simple prompts (for user-facing interactions) or broader, more structured system prompts (for behind-the-scenes guidance). Get Organized As prompting becomes more common, teams need a unified approach to avoid conflicting instructions. Proper documentation on system prompting is crucial, especially in larger teams. This helps prevent errors and hallucinations in model responses. Prompt experimentation may reveal limitations in AI models, and there are several ways to address these: Looking Ahead The UX landscape is evolving rapidly. Many organizations, particularly smaller ones, have yet to realize the importance of UX in AI prompting. Others may not allocate enough resources, underestimating the complexity and importance of UX in shaping AI interactions. As John Culkin said, “We shape our tools, and thereafter, our tools shape us.” The responsibility of integrating UX into AI development goes beyond just individual organizations—it’s shaping the future of human-computer interaction. This is a pivotal moment for UX, and how we adapt will define the next generation of design. Content updated October 2024. Like Related Posts Salesforce OEM AppExchange Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more Salesforce Jigsaw Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more Health Cloud Brings Healthcare Transformation Following swiftly after last week’s successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more Top Ten Reasons Why Tectonic Loves the Cloud The Cloud is Good for Everyone – Why Tectonic loves the cloud You don’t need to worry about tracking licenses. Read more

Read More
  • 1
  • 2
gettectonic.com