[ASCENT: A Benchmark for Evaluating and Advancing Stepwise Diagnostic Reasoning in Large Language Models on Common Clinical Scenarios]
| Paper Section | Code Location | Description |
|---|---|---|
| Experimental Settings - models | sft.py |
Training details |
| Experimental Settings - Inference Settings | inference.py |
Inference details |
| Evaluation Metrics | evaluation.py |
LLM-as-a-Judge & details |
| Evaluation Metrics | postprocessing.py |
Post-processing LLM Results |
Each main function or class in the code is annotated with the corresponding paper section as a comment.
(e.g.,# Load configuration as described in the "Experimental Settings – Models" section of the paper.)
The ASCENT dataset contains two tasks:
| Task | Directory | Description |
|---|---|---|
| (1) Impressions (Imp) | data/ascent |
Generating impressions only |
| (2) Impressions + Rationales (Imp + Reason) | data/ascent_w_reason |
Generating supporting rationales followed by impressions |
pip install -r requirements.txtpython3 sft.py --config sftpython3 inference.py --config inferencepython3 evaluation.py --config evaluationpython3 postprocessing.py --config evaluation- All experiments in the paper can be reproduced using this code and the provided configuration files.
- We fixed the random seed (42) for all runs (see the
seedparameters in the YAML files in theconfig/directory). - To reproduce all main results, run:
./sft.sh && python3 evaluation.py && python3 postprocessing.pyASCENT
Copyright (c) 2026-present NAVER Corp.
Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.