Repository Reading Site
03-kv-cache-layout.txt
ml-platform/examples/21-llm-serving/03-kv-cache-layout.txt
KV cache block layout example
=============================
engine.block_size = 16 tokens
cache.dtype = fp16
logical sequence A
prompt tokens: 0..47
generated tokens: 48..95
mapped blocks:
block-011 -> tokens 0..15
block-024 -> tokens 16..31
block-031 -> tokens 32..47
block-117 -> tokens 48..63
block-118 -> tokens 64..79
block-140 -> tokens 80..95
logical sequence B
prompt tokens: 0..31
generated tokens: 32..63
mapped blocks:
block-008 -> tokens 0..15
block-010 -> tokens 16..31
block-119 -> tokens 32..47
block-141 -> tokens 48..63
free block pool
block-142
block-143
block-144
...
why block mapping matters
-------------------------
- 序列的逻辑顺序可以连续,但物理 block 不必连续。
- 请求结束后,block 可以归还到 free pool,供新请求复用。
- 这类设计更适合在线生成场景里长短请求混跑。
what this file is not
---------------------
- 这不是某个引擎的精确内部结构说明。
- 它只是帮助你建立“逻辑序列”和“物理缓存块”分离的工程直觉。