link
Main idea:
There are two types of self-correction: extrinsic & intrinsic (whether you rely on external signals, like ORM, PRM, or not)
Extrinsic self-correction is not feasible in many cases, since, during inference, we need to have a superior model that judges the correctness (if the model can judge it, why not let it generate the answer right away?)
Therefore, it is ideal to have a intrinsic self-correction mechanism.
However, the fundamental question arises:
- If an LLM possesses the ability to self-correct, why doesn’t it simply offer the
correct answer in its initial attempt?
This paper explores that the current LLMs can’t do intrinsic self-correction, showing that it deteriorates the initial responses in most of the cases.
Key takeaways:
- If there is an oracle that can check the correctness, the performance usually improve (similar to the insight of this paper: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

- Intrinsic self-correction usually deteriorates the performance.


- Multi-agent debate doesn’t outperform self-consistency.

- Personal opinion: I don’t think this means that the multi-agent is inferior than self-consistency. They are just different and such a bland comparison is not enough.
-
Prompt matters a lot.

- The paper points out that Madaan paper’s self-correction showed improvement because its prompt provided incomplete information, which was later filled with the feedback.
- Therefore, the authors tried a more complete prompt, and realized that standard prompting worked better than self-correct.