https://arxiv.org/pdf/2406.13230

Key ideas:

  1. Introduce ACTCAB, an activation based confidence calibration method

  2. Introduce CODEC, an decoding/inference method based on ACTCAB

Literature Reviews

  1. Verbalization: Ask the LLM to grade its own confidence verbally.
  2. Self-consistency: Generate multiple times and compute adjacency.
  3. Sequence likelihood: Compute the geometric mean of the token probabilities.
  4. LITCAB: Train a linear layer on top of the LM’s last-layer hidden states to adjust its logits for calibration.
  5. Inference-time intervention: Train probes on attention head outputs and uses these directions to adjust activations during inference
  6. Representation Engineering: Detects truthful directions by comparing representations of truthful and untruthful counterfactuals across layers, then applies PCA to isolate truthful directions, which are then used to adjust layer outputs during generation.