AI Reasoning and Planning

Until very recently, it was observed that LLMs had a very hard time with complex problems. Context was lost, memory of previous steps was distorted, and so forth. This led to unreliable results (hallucinations) and, consequently, to a lack of trust in the technology.

Recent research has shown that LLMs are, in fact, quite good at reasoning and planning if the problem is broken into a series of steps as a result of the right prompts. This reasoning and planning greatly improves the accuracy of the LLM’s output.

Chain of Thought

One of the great breakthroughs in the field of LLMs was the discovery that appending “think step by step” to the prompt had a profound effect on the accuracy of the LLMs output. Prompting the LLM in this way forces the model to decompose the problem and to generate an “inner monologue” of the steps it is taking to solve the problem.

This has two significant advantages:

It makes the LLM’s reasoning process more transparent
It allows the model to check its own work as it goes.

For example, we might ask, “If a plane crashes on the border of the north field and the south field, where will the survivors be buried?”

Without the reasoning prompt, it is entirely possible that the LLM would pick one of the fields at random. However, if we tell it to think step-by-step, it will examine the parts of the question and realize that survivors are not buried at all.

Note that today’s LLMs have a degree of Chain-of-Thought built into them and won’t get this wrong.

There are three key aspects to explain why chain-of-thought reasoning works:

Decomposing the problem into smaller intermediate steps
CoT offers the model the ability to keep track of its work and to remember intermediate results.
Typically, more tokens are allocated to the reasoning, and thus the model can “think” longer.

Debugging

Because we’ve asked the LLM to think step-by-step, it can tell us each step in its reasoning, and we can examine those steps to see when the LLM fell of the rails. This takes a process that might otherwise be opaque and makes it transparent, greatly enhancing the debugging process.