link

Main idea:

There are two types of self-correction: extrinsic & intrinsic (whether you rely on external signals, like ORM, PRM, or not)

Extrinsic self-correction is not feasible in many cases, since, during inference, we need to have a superior model that judges the correctness (if the model can judge it, why not let it generate the answer right away?)

Therefore, it is ideal to have a intrinsic self-correction mechanism.

However, the fundamental question arises:

This paper explores that the current LLMs can’t do intrinsic self-correction, showing that it deteriorates the initial responses in most of the cases.

Key takeaways:

  1. If there is an oracle that can check the correctness, the performance usually improve (similar to the insight of this paper: Small Language Models Need Strong Verifiers to Self-Correct Reasoning

Screenshot 2024-10-08 at 10.54.11 AM.png

  1. Intrinsic self-correction usually deteriorates the performance.

Screenshot 2024-10-08 at 11.01.02 AM.png

Screenshot 2024-10-08 at 11.02.21 AM.png

  1. Multi-agent debate doesn’t outperform self-consistency.

Screenshot 2024-10-08 at 11.03.24 AM.png

  1. Prompt matters a lot.

    Screenshot 2024-10-08 at 11.05.56 AM.png