Skip to content

Issue replicate Vlaser-8b VLM on Embodiedbench EB-ALFRED #5

@estherhan1

Description

@estherhan1

Dear Authors,

I hope you are doing well.

I am currently trying to reproduce the reported results for Vlaser-8B. So far, my EB-Habitat result is relatively close to the reported number, but my EB-ALFRED performance is much lower than expected.

For reference, the reported and reproduced results are:

Expected

  • EB-ALFRED average success rate: 0.50
  • EB-Habitat success rate: 0.40

Mine

  • EB-ALFRED average success rate: 0.10
  • EB-Habitat success rate: 0.42

My setup is as follows:

  • Cluster: SLURM + Singularity (H100)
  • Model: OpenGVLab/Vlaser-8B
  • Code: official repository, with only launcher/runtime adaptations for SLURM/Singularity (mainly display, environment, and path handling)

For EB-ALFRED, the detailed task_success results are:

  • base: 0.16
  • common_sense: 0.14
  • complex_instruction: 0.12
  • visual_appearance: 0.10
  • spatial: 0.08
  • long_horizon: 0.00
  • mean over the 6 subsets: 0.10

Other aggregate statistics (mean over 6 subsets) are:

  • task_progress: 0.1678
  • num_invalid_actions: 8.68
  • planner_output_error: 0.46

I was wondering whether you might be able to share your evaluation setup used for the published Vlaser-8B EB-ALFRED result.

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions