OpenAI has established itself as a leading force in the generative AI space, with its ChatGPT being one of the most widely recognized AI tools. Powered by the GPT series of large language models (LLMs), as of September 2024, ChatGPT primarily uses GPT-4o and GPT-3.5. This insight provides an Open AI Update.
In August and September 2024, rumors circulated about a new model from OpenAI, codenamed “Strawberry.” Initially, it was unclear if this model would be a successor to GPT-4o or something entirely different. On September 12, 2024, the mystery was resolved with the official launch of OpenAI’s o1 models, including o1-preview and o1-mini.
What is OpenAI o1?
OpenAI o1 is a new family of LLMs optimized for advanced reasoning tasks. Unlike earlier models, o1 is designed to improve problem-solving by reasoning through queries rather than just generating quick responses. This deeper processing aims to produce more accurate answers to complex questions, particularly in fields like STEM (science, technology, engineering, and mathematics).
The o1 models, currently available in preview form, are intended to provide a new type of LLM experience beyond what GPT-4o offers. Like all OpenAI LLMs, the o1 series is built on transformer architecture and can be used for tasks such as content summarization, new content generation, question answering, and writing code.
Key Features of OpenAI o1
The standout feature of the o1 models is their ability to engage in multistep reasoning. By adopting a “chain-of-thought” approach, o1 models break down complex problems and reason through them iteratively. This makes them particularly adept at handling intricate queries that require a more thoughtful response.
The initial September 2024 launch included two models:
- OpenAI o1-preview: Designed to handle sophisticated reasoning tasks.
- OpenAI o1-mini: A smaller, more cost-effective version of o1.
Use Cases for OpenAI o1
The o1 models can perform many of the same functions as GPT-4o, such as answering questions, summarizing content, and generating text. However, they are particularly suited for tasks that benefit from enhanced reasoning, including:
- Advanced reasoning in STEM: Ideal for complex scientific, mathematical, and engineering problems.
- Brainstorming and ideation: The models excel at generating creative ideas and solutions.
- Scientific research: Capable of tasks like annotating cell sequencing data and processing complex mathematical equations.
- Coding: Effective at generating and debugging code, outperforming earlier models in coding benchmarks such as HumanEval and Codeforces.
- Mathematics: o1 models significantly outperform GPT-4o on math benchmarks, achieving 83% accuracy in the International Mathematics Olympiad qualifying exam compared to GPT-4o’s 13%.
- Self-fact-checking: The o1 models can improve the accuracy of their responses by cross-referencing information.
Availability and Access
The o1-preview and o1-mini models are available to users of ChatGPT Plus and Team as of September 12, 2024. OpenAI plans to extend access to ChatGPT Enterprise and Education users starting September 19, 2024. While free ChatGPT users do not have access to these models at launch, OpenAI intends to introduce o1-mini to free users in the future. Developers can also access the models through OpenAI’s API, and third-party platforms such as Microsoft Azure AI Studio and GitHub Models offer integration.
Limitations of OpenAI o1
As preview models, o1 comes with certain limitations:
- Feature gaps: Initially, the models do not support web browsing, image processing, or file uploads.
- API restrictions: Function calling and streaming are not available, and access to chat completion parameters is limited.
- Response time: Due to the more thorough reasoning process, the o1 models are slower than previous iterations.
- Usage limits: At launch, users of o1-preview were limited to 30 messages per week, which was later increased to 50. For o1-mini, users are allowed 50 messages per day.
- Higher costs: For API users, the o1 models are more expensive than GPT-4o.
Enhancing Safety with OpenAI o1
To ensure safety, OpenAI released a System Card that outlines how the o1 models were evaluated for risks like cybersecurity threats, persuasion, and model autonomy. The o1 models improve safety through:
- Chain-of-thought reasoning: This allows the models to better recognize mistakes and refine their responses.
- Jailbreak resistance: The o1 models demonstrate stronger resistance to common attacks, scoring higher on safety benchmarks.
- Content policy adherence: In evaluations like the Challenging Refusal Evaluation, o1-preview achieved a “not-unsafe” score of 0.934, outperforming GPT-4o’s 0.713.
- Bias mitigation: The o1 models show improved fairness in decision-making, performing better on evaluations involving race, gender, and age.
GPT-4o vs. OpenAI o1
Here’s a quick comparison between GPT-4o and OpenAI’s new o1 models:
Feature | GPT-4o | o1 Models |
---|---|---|
Release Date | May 13, 2024 | Sept. 12, 2024 |
Model Variants | Single model | Two variants: o1-preview and o1-mini |
Reasoning Capabilities | Good | Enhanced, especially for STEM fields |
Mathematics Olympiad Score | 13% | 83% |
Context Window | 128K tokens | 128K tokens |
Speed | Faster | Slower due to in-depth reasoning |
Cost (per million tokens) | Input: $5; Output: $15 | o1-preview: $15 input, $60 output; o1-mini: $3 input, $12 output |
Safety and Alignment | Standard | Enhanced safety, better jailbreak resistance |
OpenAI’s o1 models bring a new level of reasoning and accuracy, making them a promising advancement in generative AI.