
For each question, generate answer in the CoT manner.
Collect the ones that generated correct answer and incorrect answer, and craft cartesian product between the sets of incorrect and correct solutions.
For each pair, correct solution work as a hint to critique incorrect solution.
Generate the critique (only one iteration in this paper btw.)
