OpenAI Reveals 'Human-Level Performance' GPT-4 That Passed Bar Exam Among Top 10%

Illustration picture of ChatGPT — Reuters

KEY POINTS

The previous GPT-3.5 scored around the bottom 10% of a simulated bar exam
GPT-4 is able to handle more nuanced instructions than GPT-3.5, according to OpenAI
GPT-4 is also deemed safer and more accurate

OpenAI has revealed that GPT-4, the latest version of its primary large language model, exhibits "human-level performance" on various professional and academic tests, including passing a simulated bar exam in the top 10% of test takers.

The update is a huge improvement from GPT-3.5, which scored around the bottom 10%, OpenAI said in an announcement Tuesday.

GPT-4, which learns its skills by analyzing huge amounts of data culled from the internet, was designed to power artificial intelligence chatbots such as Bing's AI chat and OpenAI's ChatGPT as well as various other systems, from business software to personal online tutors.

OpenAI said in a blog post that the new model is "more creative and collaborative than ever before" and "can solve difficult problems with greater accuracy, thanks to its broader general knowledge and problem-solving abilities."

"The difference comes out when the complexity of the task reaches a sufficient threshold," OpenAI wrote. "GPT-4 is more reliable, creative and able to handle much more nuanced instructions than GPT-3.5."

In addition to the simulated bar exam, GPT-4 also did better than humans on other standardized tests, performing at the 93rd percentile on an SAT reading exam and the 89th percentile on the SAT Math exam, OpenAI claimed.

Additionally, GPT-4 can accept both text and image input, though it can only respond via text, according to the tech company.

OpenAI said that it used Microsoft Azure to train the model, which came out as "unprecedentedly stable" during its training run. Microsoft has invested billions of dollars in OpenAI's research since 2019, according to Bloomberg.

GPT-4 is deemed safer and more accurate as it is 82% less likely than GPT-3.5 to respond to requests for content that OpenAI does not allow. The tech also responds to sensitive requests like medical advice and self-harm in accordance with OpenAI's policies 29% more often.

However, GPT-4 is still not perfect and is less capable than humans in many real-world scenarios, according to the company. OpenAI warned that it still has a tendency to make up information (or "hallucinate"), is prone to insisting it is correct when it isn't and has the capacity to generate violent and harmful text like the previous version.

"GPT-4 still has many known limitations that we are working to address, such as social biases, hallucinations and adversarial prompts," OpenAI said.

Content violating usage guidelines is still possible because of "jailbreaks" and malicious prompts.

"As the 'risk per token' of AI systems increases, it will become critical to achieve extremely high degrees of reliability in these interventions; for now it's important to complement these limitations with deployment-time safety techniques like monitoring for abuse," the company said.

On Tuesday, OpenAI started selling GPT-4 access to businesses and other software developers so they could build their own applications on top of the technology, The New York Times reported.

The new model is available to the general public via ChatGPT Plus, OpenAI's $20 monthly ChatGPT subscription. Microsoft also confirmed that Bing's AI-powered search chatbot is running on GPT-4.

Morgan Stanley Wealth Management is using GPT-4 to build a system that will instantly retrieve information from company documents and other records, and serve it up to financial advisers in conversational prose. Khan Academy, an online education company, is using the technology to build an automated tutor.

Microsoft is the big tech company that has gone furthest in pushing out generative AI to consumers and has pledged to pour billions of dollars into OpenAI, the company behind ChatGPT — AFP

Artificial intelligence Microsoft

Join the Discussion