🟢 Training Language Models for Self-Correct via Reinforcement Learning
🟡 Generating Sequences by Learning to Self-Correct
🟢 Large Language Models Cannot Self-correct Reasoning Yet
🟢 Recursive Introspection: Teaching Language Model Agents How to Self-Improve
🟢 Small Language Models Need Strong Verifiers to Self-Correct Reasoning
🟡 Self-refine: Iterative Refinement with Self-feedback
🟡 Refiner: Reasoning Feedback on Intermediate Representations
🟢 LIMA: Less Is More for Alignment
🟢 Large Language Models have Intrinsic Self-Correction Ability
🟢 ReST-MCTS*: LLM Self-Training via Process Reward Guided Tree Search
🔴 Toward Self-Improvement of LLMs via Imagination, Searching and Criticizing.