Skip to content

[Code scan] Avoid mutating HDFS backward file lists during download #615

Description

@njzjz

Found by a Codex global repository scan of deepmodeling/dpdispatcher at commit 98a9e08.

Problem
HDFS download(back_error=True) reuses the submission's backward file lists directly and appends absolute glob() results to them.

Relevant code

for task in submission.belonging_tasks:
local_job = os.path.join(self.local_root, task.task_work_path)
remote_job = os.path.join(gz_dir, task.task_work_path)
flist = task.backward_files
if back_error:
errors = glob(os.path.join(remote_job, "error*"))
flist.extend(errors)
for jj in flist:
rfile = os.path.join(remote_job, jj)
lfile = os.path.join(local_job, jj)

local_job = self.local_root
remote_job = gz_dir
flist = submission.backward_common_files
if back_error:
errors = glob(os.path.join(remote_job, "error*"))
flist.extend(errors)
for jj in flist:
rfile = os.path.join(remote_job, jj)
lfile = os.path.join(local_job, jj)

Impact
The submission object is mutated during download. Later downloads can try to fetch stale temporary paths. Because os.path.join(local_job, absolute_path) ignores local_job, error files may be moved to unintended absolute locations or not copied under the task/common output directory.

Suggested fix
Build local copied lists, for example flist = list(task.backward_files), and append relative paths or basenames for generated error files. Do the same for submission.backward_common_files.

Metadata

Metadata

Assignees

No one assigned

    Labels

    bugSomething isn't working

    Type

    No type

    Fields

    No fields configured for issues without a type.

    Projects

    Status
    Todo

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions