read cached content #750

Mrxiangli · 2025-01-24T15:35:54Z

Thank you for the great project. Is there any API that can be used to read the cached content?

yzh119 · 2025-01-24T16:16:02Z

Hi @Mrxiangli do you mean cache for jit compilation? Or KV-Cache?

Mrxiangli · 2025-01-24T16:25:46Z

Thank you for the quick response. I mean the kv cache, for example, if i want to gather the kv cache of a request that finished prefill and migrate to another GPU, is there an API to help retrieve those related kv cache?

yzh119 · 2025-01-24T18:17:11Z

Got it, not yet but it should be easy to support (e.g. using triton or pure pytorch if don't care about its performance), would you mind creating a PR to support it? We can put this under https://github.com/flashinfer-ai/flashinfer/blob/main/flashinfer/page.py

The semantic is:

# k_cache: [num_pages, page_size, num_heads, head_dim] if under NHD layout
# v_cache: [num_pages, page_size, num_heads, head_dim] if under NHD layout
# output_k: [kv_len, num_heads, head_dim]
# output_v: [kv_len, num_heads, head_dim[

for i in range(kv_len[request_idx]):
    page_idx = kv_page_indices[kv_indptr[i] + i // page_size]
    pos_idx = i % page_size
    output_k[i] = k_cache[page_idx, pos_idx]
    output_v[i] = v_cache[page_idx, pos_idx]

Mrxiangli · 2025-01-25T15:06:23Z

Thank you for the answers! I will create a PR for this.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

read cached content #750

read cached content #750

Mrxiangli commented Jan 24, 2025

yzh119 commented Jan 24, 2025

Mrxiangli commented Jan 24, 2025

yzh119 commented Jan 24, 2025 •

edited

Loading

Mrxiangli commented Jan 25, 2025

read cached content #750

read cached content #750

Comments

Mrxiangli commented Jan 24, 2025

yzh119 commented Jan 24, 2025

Mrxiangli commented Jan 24, 2025

yzh119 commented Jan 24, 2025 • edited Loading

Mrxiangli commented Jan 25, 2025

yzh119 commented Jan 24, 2025 •

edited

Loading