🟢 - carefully read
🟡 - skimmed
🔴 - should read

Both

🟢 Training Language Models for Self-Correct via Reinforcement Learning

🟡 Generating Sequences by Learning to Self-Correct

🟢 Large Language Models Cannot Self-correct Reasoning Yet

Minwu

🟢 Recursive Introspection: Teaching Language Model Agents How to Self-Improve

🟢 GLoRe: When, Where, and How to Improve LLM Reasoning via Global and Local Refinement (FAIR at Meta)

🟢 Small Language Models Need Strong Verifiers to Self-Correct Reasoning

🟡 Self-refine: Iterative Refinement with Self-feedback

🟡 Refiner: Reasoning Feedback on Intermediate Representations

🟢 LIMA: Less Is More for Alignment

🟢 Large Language Models have Intrinsic Self-Correction Ability

🟢 GSM-Symbolic: Understanding the Limitations of Mathematical Reasoning in Large Language Models.

🟢 ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search

🔴 Toward Self-Improvement of LLMs via Imagination, Searching and Criticizing.

🟡 Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding