🟡 Enhancing Language Model Factuality via Activation-Based Confidence Calibration and Guided Decoding

Introduce ACTCAB, an activation based confidence calibration method
- Average of activation was used.
- Typical MSE rather than cross entropy was used, as linear regression inherently mapes input features to output predictions in a continuous space.
- Bin-based soft labelling was implemented.
Introduce CODEC, an decoding/inference method based on ACTCAB
- Local: Greedy decoding, based on weighted sum of token probability and estimated confidence.
- Global: Apply ACTCAB again and use the token sequence only if the overall confidence score exceeds the greedy decoding.

Verbalization: Ask the LLM to grade its own confidence verbally.
Self-consistency: Generate multiple times and compute adjacency.
Sequence likelihood: Compute the geometric mean of the token probabilities.
LITCAB: Train a linear layer on top of the LM’s last-layer hidden states to adjust its logits for calibration.
Inference-time intervention: Train probes on attention head outputs and uses these directions to adjust activations during inference
Representation Engineering: Detects truthful directions by comparing representations of truthful and untruthful counterfactuals across layers, then applies PCA to isolate truthful directions, which are then used to adjust layer outputs during generation.