A question about the code

In `Reconstruction/gsfixer/cogvideo/inference.py`, lines 105–118, `control_pixel_values` is normalized to `[-1, 1]` via `(frames - 0.5) * 2`. However, `first_image` and `last_image` are then converted by `(image * 255).astype(np.uint8)` directly in the `[-1, 1]` range without mapping back to `[0, 1]` first. This appears to produce incorrect pixel values for the VGGT and DINO inputs.

Here is the relevant code snippet:
```python
control_pixel_values = (frames - 0.5) * 2
control_pixel_values = control_pixel_values.permute(1, 0, 2, 3).unsqueeze(0)
ref_first_last_pixel_values = torch.cat([control_pixel_values[:, :, 0, :, :].unsqueeze(2), control_pixel_values[:, :, -1, :, :].unsqueeze(2)], dim=2)

ref_first_last_image_path = []  # for vggt
first_image = control_pixel_values[:, :, 0, :, :].squeeze(0).cpu().clone().permute(1, 2, 0).numpy()
first_image = (first_image * 255).astype(np.uint8)
first_image = Image.fromarray(first_image)
ref_first_last_image_path.append(first_image)
last_image = control_pixel_values[:, :, -1, :, :].squeeze(0).cpu().clone().permute(1, 2, 0).numpy()
last_image = (last_image * 255).astype(np.uint8)
last_image = Image.fromarray(last_image)
ref_first_last_image_path.append(last_image)
vggt_images = load_and_preprocess_images_(ref_first_last_image_path).to(self.opts.device)

dino_latents = self.image_encoder(vggt_images).last_hidden_state[:, 5:, :].to(self.opts.weight_dtype)
output_list, patch_start_idx = self.vggt.aggregator.forward(vggt_images.unsqueeze(0))
vggt_latents = output_list[-1][:, :, patch_start_idx:, :].squeeze(0).to(self.opts.weight_dtype)
```

The correct inverse transform should be `((image + 1) / 2 * 255)` or equivalent. Could you please take a look and help clarify whether this is indeed a bug? Thanks!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A question about the code #13

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

A question about the code #13

Description

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions