[ASCENT: A Benchmark for Evaluating and Advancing Stepwise Diagnostic Reasoning in Large Language Models on Common Clinical Scenarios]

Paper-to-Code Mapping

Paper Section	Code Location	Description
Experimental Settings - models	`sft.py`	Training details
Experimental Settings - Inference Settings	`inference.py`	Inference details
Evaluation Metrics	`evaluation.py`	LLM-as-a-Judge & details
Evaluation Metrics	`postprocessing.py`	Post-processing LLM Results

Each main function or class in the code is annotated with the corresponding paper section as a comment.
(e.g., # Load configuration as described in the "Experimental Settings – Models" section of the paper.)

Dataset

The ASCENT dataset contains two tasks:

Task	Directory	Description
(1) Impressions (Imp)	`data/ascent`	Generating impressions only
(2) Impressions + Rationales (Imp + Reason)	`data/ascent_w_reason`	Generating supporting rationales followed by impressions

Installation

pip install -r requirements.txt

Usage

1. Training

python3 sft.py --config sft

2. Inference

python3 inference.py --config inference

3. Evaluation

python3 evaluation.py --config evaluation

4. Post-Processing

python3 postprocessing.py --config evaluation

Reproducibility

All experiments in the paper can be reproduced using this code and the provided configuration files.
We fixed the random seed (42) for all runs (see the seed parameters in the YAML files in the config/ directory).
To reproduce all main results, run:

./sft.sh && python3 evaluation.py && python3 postprocessing.py

License

ASCENT
Copyright (c) 2026-present NAVER Corp.

Licensed under the Apache License, Version 2.0 (the "License");
you may not use this file except in compliance with the License.
You may obtain a copy of the License at

    http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software
distributed under the License is distributed on an "AS IS" BASIS,
WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
See the License for the specific language governing permissions and
limitations under the License.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

[ASCENT: A Benchmark for Evaluating and Advancing Stepwise Diagnostic Reasoning in Large Language Models on Common Clinical Scenarios]

Table of Contents

Paper-to-Code Mapping

Dataset

Installation

Usage

1. Training

2. Inference

3. Evaluation

4. Post-Processing

Reproducibility

License

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
config		config
data		data
datamodule		datamodule
model		model
utils		utils
LICENSE		LICENSE
README.md		README.md
evaluation.py		evaluation.py
inference.py		inference.py
postprocessing.py		postprocessing.py
requirements.txt		requirements.txt
sft.py		sft.py
sft.sh		sft.sh

Uh oh!

Folders and files

Latest commit

History

Repository files navigation

[ASCENT: A Benchmark for Evaluating and Advancing Stepwise Diagnostic Reasoning in Large Language Models on Common Clinical Scenarios]

Table of Contents

Paper-to-Code Mapping

Dataset

Installation

Usage

1. Training

2. Inference

3. Evaluation

4. Post-Processing

Reproducibility

License

About

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages