Dynamic Quantization
Store the weights as 8-bit integers and decide the activation scale on the fly.
Key Insight
Quantization stores a model's weights in low-precision integers like int8 instead of 32-bit floats. Dynamic quantization keeps the weights quantized ahead of time but computes the scale for each layer's activations at runtime, just before the matmul.
Why This Matters
int8 weights use a quarter of the memory and run faster on many CPUs, which helps most with the large linear layers in an LLM. Measuring the quality drop tells you whether the speedup is worth it.