https://arxiv.org/pdf/2406.15673
Overall feeling: Poorly written paper. Painful to read.
Key idea: trying to refute the paper LLM can’t self-correct yet.
Order of reasoning matters, with temperature difference

So the idea is that, when you let it decide first and then generate, when the temperature is high, there is higher variance when you decide first and provide rationale. So, the order matters.
This is not relevant to probability score based self-verification, as that always takes the one with higher score.
I really don’t get why they didn’t compare order 1 and order 2 in the same setting.
Biasedness of prompt matters
Biased: Find out the problem and say you are wrong
Unbiased: Do you think it is wrong? If so find it out.
