https://arxiv.org/pdf/2406.13230
Key ideas:
-
Introduce ACTCAB, an activation based confidence calibration method
-
Average of activation was used.
-
Typical MSE rather than cross entropy was used, as linear regression inherently mapes input features to output predictions in a continuous space.
-
Bin-based soft labelling was implemented.

-
Introduce CODEC, an decoding/inference method based on ACTCAB
-
Local: Greedy decoding, based on weighted sum of token probability and estimated confidence.
-
Global: Apply ACTCAB again and use the token sequence only if the overall confidence score exceeds the greedy decoding.

Literature Reviews
- Verbalization: Ask the LLM to grade its own confidence verbally.
- Self-consistency: Generate multiple times and compute adjacency.
- Sequence likelihood: Compute the geometric mean of the token probabilities.
- LITCAB: Train a linear layer on top of the LM’s last-layer hidden states to adjust its logits for calibration.
- Inference-time intervention: Train probes on attention head outputs and uses these directions to adjust activations during inference
- Representation Engineering: Detects truthful directions by comparing representations of truthful and untruthful counterfactuals across layers, then applies PCA to isolate truthful directions, which are then used to adjust layer outputs during generation.