Skip to content

Fix gpu stats for msm driver#2073

Open
Drakulix wants to merge 5 commits into
flightlessmango:masterfrom
Drakulix:fix-msm
Open

Fix gpu stats for msm driver#2073
Drakulix wants to merge 5 commits into
flightlessmango:masterfrom
Drakulix:fix-msm

Conversation

@Drakulix

@Drakulix Drakulix commented Jun 14, 2026

Copy link
Copy Markdown

Adreno-GPUs have split render and display nodes. As a result we sometimes deal with the msm_dpu module (for the display engine) or the msm module (for the render engine).

#1704 and other PRs refactored the gpu detection logic in a way, where we don't set the module to msm anymore. This broke the logic in gpu_fdinfo in several place for this specific driver. This PR is trying to fix everything I was able to easily identify.

  • 0b2badb fixes the hwmon sensor detection logic to test for msm_dpu instead of msm now. (It seems like this logic might have previously applied to the msm_drm driver as well, but I don't have a device to test that.) This fixes the displayed gpu temperature.
  • eba872f adds a missing memory-type for msm, which is present in newer kernels and allows PVRAM to be used with this driver.
  • 0be0cac fixes the mismatch between the render-driver read out from fdinfo and the stored module which referrs to the display-engine driver in our case. I am not sure, if there is a better/cleaner way to do this, but like this we are at least not re-introducing a separate msm_driver instance member. This fixes the gpu load and pvram being actually displayed.

@Drakulix

Copy link
Copy Markdown
Author

Anything I can do to get this reviewed?

@flightlessmango

Copy link
Copy Markdown
Owner

ping @17314642

@17314642

Copy link
Copy Markdown
Contributor

Regarding eba872f, yeah, it's available since 6.5 (torvalds/linux@3e9757f), I just didn't find it

As for the other two commits, I'll need to write a friend with whom I tested my previous changes, because I never had any Qualcomm SoCs, and see if anything's broken on his side

What's your device and what was broken for you, only process vram, temperature and gpu load?

@Drakulix

Drakulix commented Jun 21, 2026

Copy link
Copy Markdown
Author

As for the other two commits, I'll need to write a friend with whom I tested my previous changes, because I never had any Qualcomm SoCs, and see if anything's broken on his side

I can also try to point you to the relevant code-changes, because most of this is just fixing stuff that #1704 broke.

What's your device and what was broken for you, only process vram, temperature and gpu load?

The device is an Ayaneo Pocket S2 (SM8650) running postmarketOS. Basically all gpu stats were just showing 0 before. Without 0b2badb mangohud completely skips over every file descriptor here (confirmed by attaching a debugger).

So these are just all the stats I managed to fix up easily. I am not sure, if e.g. full VRAM load is something that is supposed to work / could work with the info the driver exposes. But with these commits the three mentioned stats (process vram, temp, load) output sensible looking values again.

@17314642

Copy link
Copy Markdown
Contributor

Total vram usage not working is fine in this case
Looks good to me, commits are simple enough to fix in case we missed something

@17314642

Copy link
Copy Markdown
Contributor

@flightlessmango

@flightlessmango

Copy link
Copy Markdown
Owner

Since mangohud-next is now part of master, my concern is that the two implementations will gradually diverge. Ideally both would eventually use the same metrics pipeline, but that's beyond the scope of this PR

Given that, I think this PR should at least implement the corresponding functionality in mangohud-next as well, so we don't introduce another feature gap that has to be reconciled later

@17314642

17314642 commented Jun 21, 2026

Copy link
Copy Markdown
Contributor

The only thing mangohud-next is missing is process vram, hwmon fixes are not necessary because I changed the logic of how fdinfo fds are opened.

Previously it was opening every file inside /proc/pid/fdinfo and checking driver name
Now it opens /proc/pid/fd, remembers which fds point to gpu and open same fds, but inside /proc/pid/fdinfo

@17314642

Copy link
Copy Markdown
Contributor

@Drakulix i opened a pr in your fork for adding pvram support to mangohud-next and also readme support table for qualcomm

Drakulix#1

@17314642

Copy link
Copy Markdown
Contributor

I also recalled how my code works, since it's been a while, I can confirm that those 3 commits should be totally fine

@Drakulix

Copy link
Copy Markdown
Author

Thanks @17314642! I pulled the two commits into the branch.

@17314642

17314642 commented Jun 22, 2026

Copy link
Copy Markdown
Contributor

@Drakulix I decided to make legacy mangohud use metrics code from mangohud-next, so that there is no deviation in functionality in the future.

Can you please check it out and tell whether everything is working for you?

https://github.com/17314642/MangoHud
Branch is gpu-fdinfo-use-next

The mangohud-next implementation of qualcomm metrics is located in mangohud-next/server/metrics/gpu/msm

@Drakulix

Copy link
Copy Markdown
Author

Can you please check it out and tell whether everything is working for you?

Sadly not, all three stats report 0 again. I can try to debug this later today.

@17314642

Copy link
Copy Markdown
Contributor

Can you send logs with MANGOHUD_LOG_LEVEL=trace so I could also take a look

@Drakulix

Copy link
Copy Markdown
Author

Can you send logs with MANGOHUD_LOG_LEVEL=trace so I could also take a look

MangoHud.log

@17314642

Copy link
Copy Markdown
Contributor

Are you using gamescope?

If yes, I forgot about it, can you try without it?

@17314642

Copy link
Copy Markdown
Contributor

Oh, and you're using something else than my branch, these logs are from old GPU_fdinfo implementation

@Drakulix

Drakulix commented Jun 22, 2026

Copy link
Copy Markdown
Author

Oh, and you're using something else than my branch, these logs are from old GPU_fdinfo implementation

Yeah sorry, my packaging mistake. It is still not working though. This is now with just vkcube on a normal desktop with mangohud vkcube --present_mode 1:
MangoHud.log

Under gamescope it does now list the GPU with 1% usage, 0-degrees temperature and some pvram used, which might be right for steam, but doesn't look like it switches to the game process.

@17314642

Copy link
Copy Markdown
Contributor

It doesn't switch to game process, because I forgot to add gamescope support, I'll add it.

Does it work without gamescope though?

@Drakulix

Drakulix commented Jun 22, 2026

Copy link
Copy Markdown
Author

Does it work without gamescope though?

No, it still list as 0. See the log.

(And temperature never works.)

@17314642

Copy link
Copy Markdown
Contributor

Okay, let's fix GPU load and proc_vram first

Can you send the output of more /proc/GAME_PID/fdinfo/* and ls -l /proc/GAME_PID/fd | grep -E "renderD|dri"

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants