Skip to content

A question about the code #13

@Victor-818

Description

@Victor-818

In Reconstruction/gsfixer/cogvideo/inference.py, lines 105–118, control_pixel_values is normalized to [-1, 1] via (frames - 0.5) * 2. However, first_image and last_image are then converted by (image * 255).astype(np.uint8) directly in the [-1, 1] range without mapping back to [0, 1] first. This appears to produce incorrect pixel values for the VGGT and DINO inputs.

Here is the relevant code snippet:

control_pixel_values = (frames - 0.5) * 2
control_pixel_values = control_pixel_values.permute(1, 0, 2, 3).unsqueeze(0)
ref_first_last_pixel_values = torch.cat([control_pixel_values[:, :, 0, :, :].unsqueeze(2), control_pixel_values[:, :, -1, :, :].unsqueeze(2)], dim=2)

ref_first_last_image_path = []  # for vggt
first_image = control_pixel_values[:, :, 0, :, :].squeeze(0).cpu().clone().permute(1, 2, 0).numpy()
first_image = (first_image * 255).astype(np.uint8)
first_image = Image.fromarray(first_image)
ref_first_last_image_path.append(first_image)
last_image = control_pixel_values[:, :, -1, :, :].squeeze(0).cpu().clone().permute(1, 2, 0).numpy()
last_image = (last_image * 255).astype(np.uint8)
last_image = Image.fromarray(last_image)
ref_first_last_image_path.append(last_image)
vggt_images = load_and_preprocess_images_(ref_first_last_image_path).to(self.opts.device)

dino_latents = self.image_encoder(vggt_images).last_hidden_state[:, 5:, :].to(self.opts.weight_dtype)
output_list, patch_start_idx = self.vggt.aggregator.forward(vggt_images.unsqueeze(0))
vggt_latents = output_list[-1][:, :, patch_start_idx:, :].squeeze(0).to(self.opts.weight_dtype)

The correct inverse transform should be ((image + 1) / 2 * 255) or equivalent. Could you please take a look and help clarify whether this is indeed a bug? Thanks!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions