🟢 Large Language Models Cannot Self-correct Reasoning Yet

Main idea:

There are two types of self-correction: extrinsic & intrinsic (whether you rely on external signals, like ORM, PRM, or not)

Extrinsic self-correction is not feasible in many cases, since, during inference, we need to have a superior model that judges the correctness (if the model can judge it, why not let it generate the answer right away?)

Therefore, it is ideal to have a intrinsic self-correction mechanism.

However, the fundamental question arises:

If an LLM possesses the ability to self-correct, why doesn’t it simply offer the correct answer in its initial attempt?

This paper explores that the current LLMs can’t do intrinsic self-correction, showing that it deteriorates the initial responses in most of the cases.

Key takeaways:

If there is an oracle that can check the correctness, the performance usually improve (similar to the insight of this paper: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Screenshot 2024-10-08 at 10.54.11 AM.png

Intrinsic self-correction usually deteriorates the performance.

Screenshot 2024-10-08 at 11.01.02 AM.png

Screenshot 2024-10-08 at 11.02.21 AM.png

Multi-agent debate doesn’t outperform self-consistency.

Screenshot 2024-10-08 at 11.03.24 AM.png

Personal opinion: I don’t think this means that the multi-agent is inferior than self-consistency. They are just different and such a bland comparison is not enough.

Prompt matters a lot.
- The paper points out that Madaan paper’s self-correction showed improvement because its prompt provided incomplete information, which was later filled with the feedback.
- Therefore, the authors tried a more complete prompt, and realized that standard prompting worked better than self-correct.