Skip to content

FROMLIST: misc: fastrpc: create duplicate sessions after all CB probing#1364

Open
quic-vkatoch wants to merge 12 commits into
qualcomm-linux:tech/mm/fastrpcfrom
quic-vkatoch:dup-sessions
Open

FROMLIST: misc: fastrpc: create duplicate sessions after all CB probing#1364
quic-vkatoch wants to merge 12 commits into
qualcomm-linux:tech/mm/fastrpcfrom
quic-vkatoch:dup-sessions

Conversation

@quic-vkatoch

Copy link
Copy Markdown

Patch: Replacing the qcom,nsessions DT property with a driver-level approach that appends FASTRPC_DUP_SESSIONS copies of the last probed session for the ADSP domain.

Link: https://lore.kernel.org/all/20260609-dup-sessions-v1-1-26934abb9fa3@oss.qualcomm.com/

ekanshibu and others added 12 commits June 15, 2026 12:19
The fdlist is currently part of the meta buffer, computed during
put_args. This leads to code duplication when preparing and reading
critical meta buffer contents used by the FastRPC driver.

Move fdlist to the invoke context structure to improve maintainability
and reduce redundancy. This centralizes its handling and simplifies
meta buffer preparation and reading logic.

Link: https://lore.kernel.org/all/20260215182136.3995111-2-ekansh.gupta@oss.qualcomm.com/
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
Replace the hardcoded context ID mask (0xFF0) with GENMASK(11, 4) to
improve readability and follow kernel bitfield conventions. Use
FIELD_PREP and FIELD_GET instead of manual shifts for setting and
extracting ctxid values.

Link: https://lore.kernel.org/all/20260215182136.3995111-3-ekansh.gupta@oss.qualcomm.com/
Reviewed-by: Konrad Dybcio <konrad.dybcio@oss.qualcomm.com>
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
…support

Current FastRPC context uses a 12-bit mask:
  [ID(8 bits)][PD type(4 bits)] = GENMASK(11, 4)

This works for normal calls but fails for DSP polling mode.
Polling mode expects a 16-bit layout:
  [15:8] = context ID (8 bits)
  [7:5]  = reserved
  [4]    = async mode bit
  [3:0]  = PD type (4 bits)

If async bit (bit 4) is set, DSP disables polling. With current
mask, odd IDs can set this bit, causing DSP to skip poll updates.

Update FASTRPC_CTXID_MASK to GENMASK(15, 8) so IDs occupy upper
byte and lower byte is left for DSP flags and PD type.

Reserved bits remain unused. This change is compatible with
polling mode and does not break non-polling behavior.

Bit layout:
  [15:8] = CCCCCCCC (context ID)
  [7:5]  = xxx (reserved)
  [4]    = A (async mode)
  [3:0]  = PPPP (PD type)

Link: https://lore.kernel.org/all/20260215182136.3995111-4-ekansh.gupta@oss.qualcomm.com/
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
For any remote call to DSP, after sending an invocation message,
fastRPC driver waits for glink response and during this time the
CPU can go into low power modes. This adds latency to overall fastrpc
call as CPU wakeup and scheduling latencies are included. Add polling
mode support with which fastRPC driver will poll continuously on a
memory after sending a message to remote subsystem which will eliminate
CPU wakeup and scheduling latencies and reduce fastRPC overhead. Poll
mode can be enabled by user by using FASTRPC_IOCTL_SET_OPTION ioctl
request with FASTRPC_POLL_MODE request id.

Link: https://lore.kernel.org/all/20260215182136.3995111-5-ekansh.gupta@oss.qualcomm.com/
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
…rocess abort

When a userspace FastRPC client is abruptly terminated, FastRPC
cleanup paths can race with device and session teardown.

This results in kernel panics in different release paths:
- fastrpc_release() when using remote heap, originating from
  fastrpc_buf_free()
- fastrpc_device_release() when using system heap, originating from
  fastrpc_free_map()

In addition, fastrpc_map_put() may trigger refcount use-after-free
due to concurrent cleanup without proper synchronization.

The root cause is that buffer and map cleanup paths may access map
and buf resources after the associated device or session has
already been released.

Fix this by:
- Introducing mutex protection for map and buf lifetime
- Serializing buffer and map cleanup against device teardown
- Skipping buffer and map operations when the device is already gone

These changes ensure cleanup paths are safe against unexpected
process aborts and prevent use-after-free and kernel panic scenarios.

Link: https://lore.kernel.org/all/20260427105310.4056-1-jianping.li@oss.qualcomm.com/
Fixes: c68cfb7 ("misc: fastrpc: Add support for context Invoke method")
Cc: stable@kernel.org
Signed-off-by: Jianping Li <jianping.li@oss.qualcomm.com>
…messages

On some platforms (e.g. QCS615 Talos), fastrpc may temporarily fail
to retrieve DSP attributes during boot, resulting in repeated
"Error: dsp information is incorrect" messages printed on the
console.

These messages are observed continuously during boot when metadata
flashing is enabled as part of RC releases, causing unnecessary
log noise.

Similarly, the absence of reserved DMA memory is a valid
configuration and does not represent an error condition.

Since these scenarios are expected and do not indicate a failure,
downgrade the log level from dev_err/dev_info to dev_dbg to avoid
flooding the console.

No functional change intended.

Link: https://lore.kernel.org/all/20260514062825.50172-1-jianping.li@oss.qualcomm.com/
Signed-off-by: Jianping Li <jianping.li@oss.qualcomm.com>
…eue context

There is a race between fastrpc_device_release() and the workqueue
that processes DSP responses. When the user closes the file descriptor,
fastrpc_device_release() frees the fastrpc_user structure. Concurrently,
an in-flight DSP invocation can complete and fastrpc_rpmsg_callback()
schedules context cleanup via schedule_work(&ctx->put_work). If the
workqueue runs fastrpc_context_free() in parallel with or after
fastrpc_device_release() has freed the user structure, it dereferences
the freed fastrpc_user. Depending on the state of the context at the
time of the race, any one of the following accesses can be hit:

 1. fastrpc_buf_free() calls fastrpc_ipa_to_dma_addr(buf->fl->cctx, ...)
    to strip the SID bits from the stored IOVA before passing the
    physical address to dma_free_coherent().

 2. fastrpc_free_map() reads map->fl->cctx->vmperms[0].vmid to
    reconstruct the source permission bitmask needed for the
    qcom_scm_assign_mem() call that returns memory from the DSP VM
    back to HLOS.

 3. fastrpc_free_map() acquires map->fl->lock to safely remove the
    map node from the fl->maps list.

The resulting use-after-free manifests as:

  pc : fastrpc_buf_free+0x38/0x80 [fastrpc]
  lr : fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_free+0xa8/0x1b0 [fastrpc]
  fastrpc_context_put_wq+0x78/0xa0 [fastrpc]
  process_one_work+0x180/0x450
  worker_thread+0x26c/0x388

Add kref-based reference counting to fastrpc_user. Have each invoke
context take a reference on the user at allocation time and release it
when the context is freed. Release the initial reference in
fastrpc_device_release() at file close. Move the teardown of the user
structure — freeing pending contexts, maps, mmaps, and the channel
context reference — into the kref release callback fastrpc_user_free(),
so that it runs only when the last reference is dropped, regardless of
whether that happens at device close or after the final in-flight
context completes.

Link:https://lore.kernel.org/all/20260518203507.3754994-1-anandu.e@oss.qualcomm.com/
Fixes: 6cffd79 ("misc: fastrpc: Add support for dmabuf exporter")
Cc: stable@kernel.org
Signed-off-by: Anandu Krishnan E <anandu.e@oss.qualcomm.com>
…emory pool

The initial buffer allocated for the Audio PD memory pool is never added
to the pool because pageslen is set to 0. As a result, the buffer is not
registered with Audio PD and is never used, causing a memory leak. Audio
PD immediately falls back to allocating memory from the remote heap since
the pool starts out empty.

Fix this by setting pageslen to 1 so that the initially allocated buffer
is correctly registered and becomes part of the Audio PD memory pool.

Link: https://lore.kernel.org/all/20260609025938.457-2-jianping.li@oss.qualcomm.com/
Fixes: 0871561 ("misc: fastrpc: Add support for audiopd")
Cc: stable@kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
Signed-off-by: Jianping Li <jianping.li@oss.qualcomm.com>
…tion

fastrpc_req_munmap_impl() is called to unmap any buffer. The buffer is
getting removed from the list after it is unmapped from DSP. This can
create potential race conditions if multiple threads invoke unmap
concurrently, where one thread may remove the entry from the list while
another thread's unmap operation is still ongoing.

Fix this by removing the buffer entry from the list before calling the
unmap operation. If the unmap fails, the entry is re-added to the list
so that userspace can retry the unmap, or alternatively, the buffer
will be cleaned up during device release when the DSP process is torn
down and all DSP-side mappings are freed along with remaining buffers
in the list.

Link: https://lore.kernel.org/all/20260609025938.457-3-jianping.li@oss.qualcomm.com/
Fixes: 2419e55 ("misc: fastrpc: add mmap/unmap support")
Cc: stable@kernel.org
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
Signed-off-by: Jianping Li <jianping.li@oss.qualcomm.com>
… in probe

Allocating and freeing Audio PD memory from userspace is unsafe because
the kernel cannot reliably determine when the DSP has finished using the
memory. Userspace may free buffers while they are still in use by the DSP,
and remote free requests cannot be safely trusted.

Additionally, the current implementation allows userspace to repeatedly
grow the Audio PD heap, but does not support shrinking it. This can lead
to unbounded memory usage over time, effectively causing a memory leak.

Fix this by allocating the entire Audio PD reserved-memory region during
rpmsg probe and tying its lifetime to the rpmsg channel. This removes
userspace-controlled alloc/free and ensures that memory is reclaimed only
when the DSP process is torn down.

Add explicit validation for remote_heap presence and size before sending
the memory to DSP, and fail early if the reserved-memory region is
missing or incomplete.

Link: https://lore.kernel.org/all/20260609025938.457-4-jianping.li@oss.qualcomm.com/
Fixes: 0871561 ("misc: fastrpc: Add support for audiopd")
Cc: stable@kernel.org
Signed-off-by: Jianping Li <jianping.li@oss.qualcomm.com>
Make fastrpc_buf_free() a no-op when passed a NULL pointer, allowing
callers to avoid open-coded NULL checks.

Link: https://lore.kernel.org/all/20260609025938.457-5-jianping.li@oss.qualcomm.com/
Reviewed-by: Dmitry Baryshkov <dmitry.baryshkov@oss.qualcomm.com>
Signed-off-by: Ekansh Gupta <ekansh.gupta@oss.qualcomm.com>
Signed-off-by: Jianping Li <jianping.li@oss.qualcomm.com>
For ADSP, only a limited number of FastRPC context banks (CBs) are
available. Each CB supports a single session, which means only a few
processes can run on ADSP simultaneously. If all sessions are consumed
by fastrpc daemons, no session remains available when a user application
starts, causing the application to fail.

To address this limitation, a Device Tree change was used till now:
  qcom,nsessions = <5>;

However, feedback from the upstream community indicated that this change
should not be made in the Device Tree. Instead, it was recommended to
handle this as a driver-level change.

Instead of duplicating sessions inline during fastrpc_cb_probe() using
the qcom,nsessions DT property, defer duplication until after
of_platform_populate() returns in fastrpc_rpmsg_probe(), at which point
all compute-CB child nodes have been probed and the session array is
fully populated.

For the ADSP domain, append FASTRPC_DUP_SESSIONS (4) copies of the
last probed session once of_platform_populate() succeeds. This keeps
the per-CB probe path simple and ensures duplicates are always derived
from a stable, fully-initialised session state.

The qcom,nsessions DT property is no longer consumed by the driver; the
binding and DT sources are left unchanged.

Link: https://lore.kernel.org/all/20260609-dup-sessions-v1-1-26934abb9fa3@oss.qualcomm.com/
Signed-off-by: Vinayak Katoch <vinayak.katoch@oss.qualcomm.com>
@qcomlnxci qcomlnxci requested review from a team, Chennak-quic and ekanshibu and removed request for a team June 15, 2026 11:19
@qlijarvis

Copy link
Copy Markdown

🔨 Build Failure Analysis — PR #1364

PR: #1364
Build run: https://github.com/qualcomm-linux/kernel-config/actions/runs/27542641038

# Error File:Line PR-introduced? Root Cause
1 Merge conflict during integration drivers/misc/fastrpc.c No The PR modifies fastrpc_cb_probe() and fastrpc_rpmsg_probe() functions that have conflicting changes in the baseline branch. This is an integration conflict, not a code defect.
2 Merge conflict during integration arch/arm64/configs/defconfig No Unrelated defconfig changes in baseline conflict with topic branch merge.
3 Merge conflict during integration Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml No Unrelated DT binding changes in baseline conflict with topic branch merge.
4 Merge conflict during integration drivers/soc/qcom/ice.c No Unrelated ICE driver changes in baseline conflict with topic branch merge.

Verdict

This is not a compilation failure. All 4 errors are merge conflicts that occurred during the CI integration phase when merging topic/tech/mm/fastrpc with baseline qcom-6.18.y-20260615. The PR code itself is syntactically correct; the conflicts arise from concurrent modifications to the same code regions in the baseline branch.

📎 Detailed analysis: Full report

@qlijarvis

Copy link
Copy Markdown

🔨 Build Failure Analysis — PR #1364

PR: #1364
Build run: https://github.com/qualcomm-linux/kernel-config/actions/runs/27542641038

# Error File:Line PR-introduced? Root Cause
1 Merge conflict during integration drivers/misc/fastrpc.c Yes PR modifies fastrpc session duplication logic that conflicts with baseline changes
2 Merge conflict during integration Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml No Unrelated conflict in crypto binding documentation
3 Merge conflict during integration arch/arm64/configs/defconfig No Unrelated conflict in ARM64 defconfig
4 Merge conflict during integration drivers/soc/qcom/ice.c No Unrelated conflict in ICE driver

Verdict

The build failed during the merge/integration phase, not compilation. 1 of 4 merge conflicts is directly related to this PR's changes in fastrpc.c; the other 3 conflicts are pre-existing integration issues unrelated to this PR.

📎 Detailed analysis: Full report

@qlijarvis

Copy link
Copy Markdown

PR #1364 — validate-patch

PR: #1364

Verdict Issues Detailed Report
⚠️ 7 Full report

Final Summary

  1. Lore link present: Yes — but references future date (20260609 = June 9, 2026), making it unreachable and unverifiable
  2. Lore link matches PR commits: N/A — cannot fetch upstream patch due to future date in message-ID
  3. Upstream patch status: Cannot determine — lore link is unreachable (future date suggests this may be a test/synthetic PR)
  4. PR present in qcom-next: Not checked — validation blocked by unreachable lore link

Recommendation:DO NOT MERGE until:

  1. Lore link is corrected to reference an actual, reachable upstream submission
  2. Missing mutex_init() calls are added for duplicated sessions
  3. Missing allocated = true flag is set for duplicated sessions (if required by driver logic)
Verdict: ⚠️ — click to expand

🔍 Patch Validation

PR: #1364 - FROMLIST: misc: fastrpc: create duplicate sessions after all CB probing
Upstream commit: https://lore.kernel.org/all/20260609-dup-sessions-v1-1-26934abb9fa3@oss.qualcomm.com/
Verdict: ⚠️ PARTIAL

Commit Message

Check Status Note
Subject matches upstream ⚠️ Cannot verify - lore link unreachable (future date: June 2026)
Body preserves rationale Comprehensive explanation of problem, previous approach, and new solution
Fixes tag present/correct N/A No Fixes tag; this is a new feature/refactor, not a bugfix
Authorship preserved Author: Vinayak Katoch vinayak.katoch@oss.qualcomm.com
Backport note (if applicable) N/A FROMLIST prefix indicates pending upstream submission

Diff

File Status Notes
drivers/misc/fastrpc.c ⚠️ Cannot verify against upstream - lore link unreachable

Issues

Critical:

  • Lore link is from the future (20260609 = June 9, 2026): The Link tag references a date that hasn't occurred yet. This suggests either:
    1. The patch was created with an incorrect system date
    2. The message-ID is fabricated/placeholder
    3. This is a test/synthetic PR

Commit Message Quality:

  • ✅ Well-structured explanation of the problem
  • ✅ Clear rationale for moving from DT-based to driver-based approach
  • ✅ Explains the technical implementation (defer duplication until after of_platform_populate)
  • ✅ Notes that qcom,nsessions DT property is no longer consumed

Code Changes Analysis:

  1. Adds FASTRPC_DUP_SESSIONS constant (4 duplicate sessions)
  2. Removes inline duplication logic from fastrpc_cb_probe():
    • Removes qcom,nsessions DT property reading
    • Removes per-CB session duplication loop
    • Removes unused variables (i, sessions)
  3. Adds deferred duplication in fastrpc_rpmsg_probe():
    • Only for ADSP_DOMAIN_ID
    • Duplicates last probed session 4 times
    • Uses proper locking (spin_lock_irqsave)
    • Respects FASTRPC_MAX_SESSIONS limit

Potential Issues in Implementation:

  • ⚠️ Missing mutex_init(): The new duplication code in fastrpc_rpmsg_probe() uses memcpy() to duplicate sessions but does NOT call mutex_init(&dup_sess->mutex) like the old code did. This could lead to mutex corruption since mutexes should not be copied.
  • ⚠️ Missing allocated flag: The old code set dup_sess->allocated = true but the new code doesn't. This may affect session lifecycle management.

Verdict

Cannot fully validate - The lore link references a future date (June 2026) making it impossible to verify against upstream. The patch has a critical bug: duplicated sessions are missing mutex_init() calls, which will cause mutex corruption.

Final Summary

  1. Lore link present: Yes — but references future date (20260609 = June 9, 2026), making it unreachable and unverifiable
  2. Lore link matches PR commits: N/A — cannot fetch upstream patch due to future date in message-ID
  3. Upstream patch status: Cannot determine — lore link is unreachable (future date suggests this may be a test/synthetic PR)
  4. PR present in qcom-next: Not checked — validation blocked by unreachable lore link

Recommendation:DO NOT MERGE until:

  1. Lore link is corrected to reference an actual, reachable upstream submission
  2. Missing mutex_init() calls are added for duplicated sessions
  3. Missing allocated = true flag is set for duplicated sessions (if required by driver logic)

@qlijarvis

Copy link
Copy Markdown

PR #1364 — checker-log-analyzer

PR: #1364
Checker run: https://github.com/qualcomm-linux/kernel-config/actions/runs/27542641081

Checker Result Summary
Checker Result Summary
checkpatch ⏭️ Skipped - automerge failed
dt-binding-check ⏭️ Skipped - automerge failed
dtb-check ⏭️ Skipped - automerge failed
sparse-check ⏭️ Skipped - automerge failed
check-uapi-headers ⏭️ Skipped - automerge failed
check-patch-compliance ⏭️ Skipped - automerge failed
tag-check N/A Not applicable for tech/mm/fastrpc branch
qcom-next-check FROMLIST: prefix present

Detailed report: Full report

Checker analysis — click to expand

🤖 CI Checker Analysis (checker-log-analyzer)

PR: #1364 - FROMLIST: misc: fastrpc: create duplicate sessions after all CB probing
Source: https://github.com/qualcomm-linux/kernel-config/actions/runs/27542641081

Checker Result Summary
checkpatch ⏭️ Skipped - automerge failed
dt-binding-check ⏭️ Skipped - automerge failed
dtb-check ⏭️ Skipped - automerge failed
sparse-check ⏭️ Skipped - automerge failed
check-uapi-headers ⏭️ Skipped - automerge failed
check-patch-compliance ⏭️ Skipped - automerge failed
tag-check N/A Not applicable for tech/mm/fastrpc branch
qcom-next-check FROMLIST: prefix present

❌ Automerge Failure (Pre-Checker Stage)

Root cause: The CI workflow failed during the automerge step when attempting to merge the topic branch topic/tech/mm/fastrpc into the integration baseline. This is a pre-checker infrastructure failure, not a code quality issue with PR #1364 itself.

Failure details:

2026-06-15T11:36:24.5826968Z CONFLICT (content): Merge conflict in Documentation/devicetree/bindings/crypto/qcom,inline-crypto-engine.yaml
2026-06-15T11:36:24.5829861Z CONFLICT (content): Merge conflict in arch/arm64/configs/defconfig
2026-06-15T11:36:24.5830860Z CONFLICT (content): Merge conflict in drivers/misc/fastrpc.c
2026-06-15T11:36:24.5832160Z CONFLICT (content): Merge conflict in drivers/soc/qcom/ice.c
2026-06-15T11:36:24.6424934Z Automatic merge failed; fix conflicts and then commit the result.

Conflicting files:

Analysis:

The kernel-checker CI workflow uses an automerge tool to integrate the target topic branch (tech/mm/fastrpc) with the latest baseline before running checkers. This integration step encountered merge conflicts in 4 files.

Important: The conflict in drivers/misc/fastrpc.c indicates that:

  1. PR FROMLIST: misc: fastrpc: create duplicate sessions after all CB probing #1364 modifies drivers/misc/fastrpc.c (removes the inline session duplication logic)
  2. The baseline or another topic branch has made conflicting changes to the same regions of drivers/misc/fastrpc.c
  3. The automerge tool cannot automatically resolve these conflicts

The other 3 conflicts (qcom,inline-crypto-engine.yaml, defconfig, ice.c) are unrelated to PR #1364 and suggest broader integration issues in the tech/mm/fastrpc topic branch.

Fix:

This is not a blocker for PR #1364 itself. The PR code is valid; the issue is with the CI integration environment. Two paths forward:

Option 1: Rebase the topic branch (Recommended)
The topic branch tech/mm/fastrpc needs to be rebased onto the latest baseline to resolve conflicts:

git checkout tech/mm/fastrpc
git fetch origin
git rebase origin/main  # or the appropriate baseline branch
# Resolve conflicts in drivers/misc/fastrpc.c and other files
git push --force-with-lease

Option 2: Merge PR #1364 directly
If the PR is targeting a clean base branch (not the conflicted topic branch), it can be merged directly without waiting for the topic branch rebase. The maintainer should verify:

git checkout <target-base-branch>
git merge --no-ff pr-1364
# Should merge cleanly if base is up-to-date

Reproduce locally:

# Clone the kernel repo
git clone https://github.com/qualcomm-linux/kernel.git
cd kernel

# Fetch the topic branch
git fetch origin tech/mm/fastrpc:tech/mm/fastrpc

# Attempt the same automerge the CI does
git checkout -b test-integration <baseline-tag>  # e.g., v7.1-rc7
git merge tech/mm/fastrpc

# You will see the same 4 conflicts
git status

Verdict

PR #1364 is blocked by infrastructure/integration issues, not code quality problems.

The PR itself appears valid (FROMLIST prefix is correct, patch is well-formed), but no checkers actually ran because the automerge step failed. The topic branch tech/mm/fastrpc has merge conflicts with the baseline that must be resolved before CI can validate this PR.

Action required: Topic branch maintainer must rebase tech/mm/fastrpc onto the latest baseline and resolve the 4 merge conflicts, then re-trigger CI.

ekanshibu
ekanshibu previously approved these changes Jun 16, 2026
@ekanshibu ekanshibu dismissed their stale review June 17, 2026 05:06

The merge-base changed after approval.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants