OpenAI has established itself as a leading force in the generative AI space, with its ChatGPT being one of the most widely recognized AI tools. Powered by the GPT series of large language models (LLMs), as of September 2024, ChatGPT primarily uses GPT-4o and GPT-3.5. This insight provides an Open AI Update.

In August and September 2024, rumors circulated about a new model from OpenAI, codenamed “Strawberry.” Initially, it was unclear if this model would be a successor to GPT-4o or something entirely different. On September 12, 2024, the mystery was resolved with the official launch of OpenAI’s o1 models, including o1-preview and o1-mini.

What is OpenAI o1?

OpenAI o1 is a new family of LLMs optimized for advanced reasoning tasks. Unlike earlier models, o1 is designed to improve problem-solving by reasoning through queries rather than just generating quick responses. This deeper processing aims to produce more accurate answers to complex questions, particularly in fields like STEM (science, technology, engineering, and mathematics).

The o1 models, currently available in preview form, are intended to provide a new type of LLM experience beyond what GPT-4o offers. Like all OpenAI LLMs, the o1 series is built on transformer architecture and can be used for tasks such as content summarization, new content generation, question answering, and writing code.

Key Features of OpenAI o1

The standout feature of the o1 models is their ability to engage in multistep reasoning. By adopting a “chain-of-thought” approach, o1 models break down complex problems and reason through them iteratively. This makes them particularly adept at handling intricate queries that require a more thoughtful response.

The initial September 2024 launch included two models:

  • OpenAI o1-preview: Designed to handle sophisticated reasoning tasks.
  • OpenAI o1-mini: A smaller, more cost-effective version of o1.

Use Cases for OpenAI o1

The o1 models can perform many of the same functions as GPT-4o, such as answering questions, summarizing content, and generating text. However, they are particularly suited for tasks that benefit from enhanced reasoning, including:

  • Advanced reasoning in STEM: Ideal for complex scientific, mathematical, and engineering problems.
  • Brainstorming and ideation: The models excel at generating creative ideas and solutions.
  • Scientific research: Capable of tasks like annotating cell sequencing data and processing complex mathematical equations.
  • Coding: Effective at generating and debugging code, outperforming earlier models in coding benchmarks such as HumanEval and Codeforces.
  • Mathematics: o1 models significantly outperform GPT-4o on math benchmarks, achieving 83% accuracy in the International Mathematics Olympiad qualifying exam compared to GPT-4o’s 13%.
  • Self-fact-checking: The o1 models can improve the accuracy of their responses by cross-referencing information.

Availability and Access

The o1-preview and o1-mini models are available to users of ChatGPT Plus and Team as of September 12, 2024. OpenAI plans to extend access to ChatGPT Enterprise and Education users starting September 19, 2024. While free ChatGPT users do not have access to these models at launch, OpenAI intends to introduce o1-mini to free users in the future. Developers can also access the models through OpenAI’s API, and third-party platforms such as Microsoft Azure AI Studio and GitHub Models offer integration.

Limitations of OpenAI o1

As preview models, o1 comes with certain limitations:

  • Feature gaps: Initially, the models do not support web browsing, image processing, or file uploads.
  • API restrictions: Function calling and streaming are not available, and access to chat completion parameters is limited.
  • Response time: Due to the more thorough reasoning process, the o1 models are slower than previous iterations.
  • Usage limits: At launch, users of o1-preview were limited to 30 messages per week, which was later increased to 50. For o1-mini, users are allowed 50 messages per day.
  • Higher costs: For API users, the o1 models are more expensive than GPT-4o.

Enhancing Safety with OpenAI o1

To ensure safety, OpenAI released a System Card that outlines how the o1 models were evaluated for risks like cybersecurity threats, persuasion, and model autonomy. The o1 models improve safety through:

  • Chain-of-thought reasoning: This allows the models to better recognize mistakes and refine their responses.
  • Jailbreak resistance: The o1 models demonstrate stronger resistance to common attacks, scoring higher on safety benchmarks.
  • Content policy adherence: In evaluations like the Challenging Refusal Evaluation, o1-preview achieved a “not-unsafe” score of 0.934, outperforming GPT-4o’s 0.713.
  • Bias mitigation: The o1 models show improved fairness in decision-making, performing better on evaluations involving race, gender, and age.

GPT-4o vs. OpenAI o1

Here’s a quick comparison between GPT-4o and OpenAI’s new o1 models:

FeatureGPT-4oo1 Models
Release DateMay 13, 2024Sept. 12, 2024
Model VariantsSingle modelTwo variants: o1-preview and o1-mini
Reasoning CapabilitiesGoodEnhanced, especially for STEM fields
Mathematics Olympiad Score13%83%
Context Window128K tokens128K tokens
SpeedFasterSlower due to in-depth reasoning
Cost (per million tokens)Input: $5; Output: $15o1-preview: $15 input, $60 output; o1-mini: $3 input, $12 output
Safety and AlignmentStandardEnhanced safety, better jailbreak resistance

OpenAI’s o1 models bring a new level of reasoning and accuracy, making them a promising advancement in generative AI.

Related Posts
Salesforce OEM AppExchange
Salesforce OEM AppExchange

Expanding its reach beyond CRM, Salesforce.com has launched a new service called AppExchange OEM Edition, aimed at non-CRM service providers. Read more

Salesforce Jigsaw
Salesforce Jigsaw

Salesforce.com, a prominent figure in cloud computing, has finalized a deal to acquire Jigsaw, a wiki-style business contact database, for Read more

Health Cloud Brings Healthcare Transformation
Health Cloud Brings Healthcare Transformation

Following swiftly after last week's successful launch of Financial Services Cloud, Salesforce has announced the second installment in its series Read more

Top Ten Reasons Why Tectonic Loves the Cloud
cloud computing

The Cloud is Good for Everyone - Why Tectonic loves the cloud You don’t need to worry about tracking licenses. Read more

author avatar
wp-shannan