Benchmarks
================================================================================

Purpose: Detect performance regressions and provide profiling baselines.

Models:
    Dense              : Qwen3-0.6B-Q8_0
    MoE                : gemma-4-26B-A4B-it-UD-Q4_K_M
    Hybrid             : Qwen3.6-35B-A3B-UD-Q4_K_M

Shared Configuration:
    ContextWindow      : 32768
    NBatch             : 2048
    NUBatch            : 2048
    CacheTypeK         : F16
    CacheTypeV         : F16
    NSeqMax            : 1
    max_tokens         : 128
    temperature        : 0.0
    System prompt      : ~8k tokens (~39k chars)
    Total conversation : ~23k tokens (~108k chars, 38 messages)

Benchmark Matrix:

    Model Type | Cache Mode | Benchmark Name              | What It Catches
    -----------|------------|-----------------------------|----------------------------------------
    Dense      | NonCaching | BenchmarkDense_NonCaching   | Full prefill baseline (no caching)
    Dense      | IMC        | BenchmarkDense_IMC          | All-but-last cached, range-delete cleanup
    MoE        | NonCaching | BenchmarkMoE_NonCaching     | MoE expert routing baseline (no caching)
    MoE        | IMC        | BenchmarkMoE_IMC            | MoE expert routing + IMC, range-delete cleanup
    Hybrid     | NonCaching | BenchmarkHybrid_NonCaching  | Hybrid (linear+full attention) baseline
    Hybrid     | IMC        | BenchmarkHybrid_IMC         | Hybrid + IMC, snapshot/restore cleanup

================================================================================

🟢 Better   🔴 Worse   ⚪ Neutral

2026-06-09     : dev
llama.cpp      : b9574
--------------------------------------------------------------------------------
goos: darwin
goarch: arm64
cpu: Apple M5 Max

Dense NonCaching  :  52685298764 ns/op 🔴 +12.23%      15993946 B/op 🔴 +8.96%     52265 allocs/op 🔴 +13.82%
Dense IMC         :  13216076639 ns/op 🔴 +11.67%    7798320160 B/op 🔴 +199.01%   58557 allocs/op 🔴 +18.50%

MoE NonCaching    : 140170864986 ns/op 🔴 +32.52%       9925592 B/op 🔴 +15.29%    75362 allocs/op 🔴 +9.19%
MoE IMC           :  29191993069 ns/op 🔴 +22.16%    1499090842 B/op 🟢 -72.39%    80668 allocs/op 🔴 +12.54%

Hybrid NonCaching : 121682850903 ns/op 🔴 +34.78%      10262096 B/op 🔴 +14.75%    53749 allocs/op 🔴 +13.41%
Hybrid IMC        :  26194732292 ns/op 🔴 +12.86%    1401031970 B/op 🔴 +197.07%   60401 allocs/op 🔴 +17.93%

--------------------------------------------------------------------------------
Dense NonCaching          Dense IMC                 
180.80 tok/s    🔴 -4.24%    179.90 tok/s    ⚪ -0.94%
  3241 ttft-ms  🔴 +14.20%      535 ttft-ms  🔴 +19.08%
 53919 total-ms 🔴 +13.14%    13357 total-ms 🔴 +11.10%

MoE NonCaching            MoE IMC                   
 79.27 tok/s    🔴 -10.82%    75.35 tok/s    🔴 -16.78%
  8779 ttft-ms  🔴 +40.62%     1101 ttft-ms  🔴 +28.29%
143800 total-ms 🔴 +37.68%    29251 total-ms 🔴 +24.60%

Hybrid NonCaching         Hybrid IMC                
 79.08 tok/s    🟢 +9.77%     76.28 tok/s    🟢 +8.46%
  7217 ttft-ms  🔴 +36.82%      921 ttft-ms  🔴 +39.11%
120402 total-ms 🔴 +30.22%    26397 total-ms 🔴 +11.95%


================================================================================

2026-05-20     : dev
llama.cpp      : b9247
--------------------------------------------------------------------------------
goos: darwin
goarch: arm64
cpu: Apple M5 Max

Dense NonCaching  :  46945175292 ns/op ⚪ +0.95%       14678621 B/op 🟢 -4.75%     45920 allocs/op 🟢 -3.01%
Dense IMC         :  11834920611 ns/op 🟢 -7.03%     2608010026 B/op 🟢 -88.43%    49414 allocs/op 🟢 -2.79%

MoE NonCaching    : 105772311917 ns/op 🟢 -10.47%       8609504 B/op ⚪ -0.83%     69016 allocs/op ⚪ -0.58%
MoE IMC           :  23896815000 ns/op 🟢 -13.32%    5429249586 B/op 🟢 -88.39%    71679 allocs/op ⚪ -0.69%

Hybrid NonCaching :  90281838681 ns/op 🟢 -13.15%       8942952 B/op ⚪ -0.46%     47393 allocs/op ⚪ -0.76%
Hybrid IMC        :  23209232764 ns/op 🟢 -12.31%     471620765 B/op 🟢 -90.84%    51219 allocs/op ⚪ -0.77%

--------------------------------------------------------------------------------
Dense NonCaching          Dense IMC                 
188.80 tok/s    🟢 +4.08%    181.60 tok/s    ⚪ +0.89%
  2838 ttft-ms  🟢 -3.60%       449 ttft-ms  🟢 -8.05%
 47655 total-ms 🟢 -3.64%     12023 total-ms 🟢 -4.40%

MoE NonCaching            MoE IMC                   
 88.89 tok/s    🟢 +7.43%     90.54 tok/s    🟢 +12.11%
  6243 ttft-ms  🟢 -15.07%      858 ttft-ms  🟢 -16.11%
104448 total-ms 🟢 -14.29%    23476 total-ms 🟢 -13.81%

Hybrid NonCaching         Hybrid IMC                
 72.04 tok/s    🟢 +15.71%    70.33 tok/s    🟢 +14.19%
  5275 ttft-ms  🟢 -13.41%      662 ttft-ms  🟢 -8.80%
 92457 total-ms 🟢 -13.43%    23579 total-ms 🟢 -10.94%


================================================================================

2026-04-26     : dev
llama.cpp      : b8937
--------------------------------------------------------------------------------
goos: darwin
goarch: arm64
cpu: Apple M5 Max

Dense NonCaching  :  46505497625 ns/op ⚪ new          15410741 B/op ⚪ new        47344 allocs/op ⚪ new
Dense IMC         :  12729809986 ns/op ⚪ new       22532774578 B/op ⚪ new        50833 allocs/op ⚪ new

MoE NonCaching    : 118138395194 ns/op ⚪ new           8681885 B/op ⚪ new        69421 allocs/op ⚪ new
MoE IMC           :  27567692833 ns/op ⚪ new       46779733146 B/op ⚪ new        72177 allocs/op ⚪ new

Hybrid NonCaching : 103946375639 ns/op ⚪ new           8984472 B/op ⚪ new        47756 allocs/op ⚪ new
Hybrid IMC        :  26467337750 ns/op ⚪ new        5150030317 B/op ⚪ new        51617 allocs/op ⚪ new

--------------------------------------------------------------------------------
Dense NonCaching          Dense IMC                 
181.40 tok/s    ⚪ new       180.00 tok/s    ⚪ new
  2944 ttft-ms  ⚪ new          488 ttft-ms  ⚪ new
 49455 total-ms ⚪ new        12576 total-ms ⚪ new

MoE NonCaching            MoE IMC                   
 82.74 tok/s    ⚪ new        80.76 tok/s    ⚪ new
  7351 ttft-ms  ⚪ new         1023 ttft-ms  ⚪ new
121863 total-ms ⚪ new        27236 total-ms ⚪ new

Hybrid NonCaching         Hybrid IMC                
 62.26 tok/s    ⚪ new        61.59 tok/s    ⚪ new
  6092 ttft-ms  ⚪ new          726 ttft-ms  ⚪ new
106804 total-ms ⚪ new        26474 total-ms ⚪ new


================================================================================

