Artificial intelligence GPT-4 shows ‘sparks’ of common sense, human-like reasoning, finds Microsoft

Latest version of ChatGPT shows signs of artificial general intelligence; can solve novel and difficult tasks in various fields in a way strikingly close to human-level performance

Photo: iStock

Published on:

18 May 2023, 5:01 am

OpenAI’s more powerful version of ChatGPT, GPT-4, can be trained to reason and use common sense like humans, a new study by Microsoft has found.

GPT-4 is a significant step towards artificial general intelligence (AGI) and can reason, plan and learn from experience at the same level as humans do, or possibly above them, the analysis found.

The AI is part of a new cohort of large language models (LLM), including ChatGPT and Google’s PaLM. LLMs can be trained in massive amounts of data and fed both images and text to come up with answers.

Microsoft invested billions of dollars in OpenAI and had access to it before it was launched publicly. The company recently took out a 155-page analysis, Sparks of Artificial General Intelligence: Early experiments with GPT-4.

GPT-4 is also used to power Microsoft’s Bing Chat feature.

The research team discovered that LLMs can be trained to reason and use common sense like humans. They demonstrated GPT-4 can solve complex tasks in several fields without special prompting, including mathematics, vision, medicine, law and psychology.

The system available to the public is not as powerful as the version they tested, Microsoft said.

The paper gave several examples of how the AI seemed to understand concepts, like what a unicorn is. GPT-4 drew a unicorn in a sub programming language called TiKZ. In the crude “drawings”, GPT4 got the concept of a unicorn right.

To demonstrate the difference between true learning and memorisation, researchers asked GPT-4 to ‘Draw a unicorn in TikZ’ three times over the course of one month. The AI showed a clear evolution in the sophistication of the drawings. Source: Microsoft

GPT-4 also exhibited more common sense than previous models, like ChatGPT, OpenAI said. Both GPT-4 and ChatGPT were asked to stack a book, nine eggs, a laptop, a bottle and a nail.

While ChatGPT recommended placing the eggs on top of the nail, the more sophisticated model arranged the items so the eggs would not break.

However, the report acknowledged that AI still has limitations and biases and users were warned to be careful. GPT is “still not fully reliable” because it still “hallucinates” facts and makes reasoning and basic arithmetic errors.

The analysis read:

While GPT-4 is at or beyond human-level for many tasks, overall, its patterns of intelligence are decidedly not human-like. However, GPT-4 is almost certainly only a first step towards a series of increasingly generally intelligent systems, and in fact, GPT-4 itself has improved throughout our time testing it.

However, the paper warned the users to be careful, warning of its limitations like confidence calibration, cognitive fallacies and irrationality and challenges with sensitivity to inputs.

“Great care should be taken when using language model outputs, particularly in high-stakes contexts, with the exact protocol (such as human review, grounding with an additional context or avoiding high-stakes uses altogether) matching the needs of a specific use-case,” it said.

World