Add prompt caching and latest-model support to model latency benchmarking tool by evgenisokolov · Pull Request #714 · aws-samples/amazon-bedrock-samples

evgenisokolov · 2026-06-12T13:46:01Z

Summary

Enhances the existing model-latency-benchmarking/ tool. Refs #713.

This adds opt-in prompt caching, support for the latest Bedrock models, and SDK/API modernization, while keeping existing datasets working unchanged. The change is limited to the model-latency-benchmarking/ folder.

Changes

Prompt caching (opt-in, per scenario): new optional fields prompt_caching, cache_ttl (5m/1h), and cached_context, plus a global PROMPT_CACHING default. A Converse cachePoint is inserted after the cached context. Extended 1h TTL is applied only for Anthropic models; other models use the default duration.
Cache metrics: capture cache_read_input_tokens, cache_write_input_tokens, and a derived Cache_Hit_Rate; the analysis reports TTFT split by cached vs uncached. All existing per-invocation columns and aggregated metrics are preserved (cache columns are added, not renamed).
SDK/API: add a minimum boto3 version gate and confirm Converse streaming via the bedrock-runtime client.
Inference config fix: honor the configured TEMPERATURE/TOP_P/TOP_K instead of hardcoded values. Because several current models reject temperature and topP together, the tool now sends a single sampling parameter selected by INFERENCE_SAMPLING (default temperature).
Datasets: refresh the sample dataset to current models (Claude Opus 4.5, Claude Sonnet 4.5, Claude Haiku 4.5, Claude 3.7 Sonnet, Amazon Nova Pro) and add a dedicated caching demo dataset (caching-demo-prompts-for-benchmarking.jsonl) with cached/uncached pairs for Claude Haiku 4.5 and Amazon Nova Pro.
Docs: update readme.md with the new fields, metrics, prerequisites, supported models/regions, the on-demand-only constraint, and the demo dataset.

Testing

Ran the notebook end-to-end in Amazon SageMaker Studio against the caching demo dataset. After the fixes above, cached and uncached scenarios for both Claude Haiku 4.5 and Amazon Nova Pro complete without errors and produce per-invocation CSV and aggregated analysis including the cache columns.
Verified the notebook is valid JSON and all Python cells compile.

Backward compatibility

Datasets that contain only the original fields run unchanged; a scenario without the caching fields behaves exactly as before.

Note on scope

The repository's contribution guide mentions a website markdown mirror under docs/. This tool is not currently published through mkdocs.yml, so a website mirror is intentionally out of scope for this change. Happy to add one if maintainers prefer.

Enhances model-latency-benchmarking/ (refs aws-samples#713): - Add opt-in, per-scenario prompt caching (prompt_caching, cache_ttl, cached_context) with a global PROMPT_CACHING default. Inserts a Converse cachePoint after the cached context; extended 1h TTL is applied only for Anthropic models, others use the default duration. - Capture cache metrics (cache_read_input_tokens, cache_write_input_tokens, Cache_Hit_Rate) and split TTFT into cached vs uncached in the analysis, preserving all existing columns and aggregates. - Add a boto3 minimum-version gate and confirm Converse streaming usage. - Fix inference config so configured TEMPERATURE/TOP_P/TOP_K are honored instead of hardcoded; send a single sampling parameter (INFERENCE_SAMPLING) since several models reject temperature and topP together. - Refresh the sample dataset to current models (Claude Opus/Sonnet/Haiku 4.5, Claude 3.7 Sonnet, Amazon Nova Pro) and add a dedicated caching demo dataset with cached/uncached pairs for Claude Haiku 4.5 and Amazon Nova Pro. - Update readme with the new fields, metrics, prerequisites, and caching notes. Existing datasets without the new fields run unchanged. Change is limited to the model-latency-benchmarking/ folder.

review-notebook-app · 2026-06-12T13:46:07Z

Check out this pull request on

See visual diffs & provide feedback on Jupyter Notebooks.

Powered by ReviewNB

- Remove the large cached-context scenario from the sample dataset; the caching demo dataset already covers caching end to end. - Condense the readme caching docs into a single concise section that does not repeat the dataset field table, aligning with the original style.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add prompt caching and latest-model support to model latency benchmarking tool#714

Add prompt caching and latest-model support to model latency benchmarking tool#714
evgenisokolov wants to merge 2 commits into
aws-samples:mainfrom
evgenisokolov:latency-benchmarking-prompt-caching

evgenisokolov commented Jun 12, 2026

Uh oh!

review-notebook-app Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

evgenisokolov commented Jun 12, 2026

Summary

Changes

Testing

Backward compatibility

Note on scope

Uh oh!

review-notebook-app Bot commented Jun 12, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant