Re: [PATCH 13/26] RFC drm/xe/eudebug: userptr vm pread/pwrite

Christian König <christian.koenig@xxxxxxx> · Thu, 30 Jan 2025 13:09:01 +0100

    Am 29.01.25 um 19:18 schrieb Joonas Lahtinen:

          Would be great to reach a consensus on the high level details before
spinning off further series addressing the smaller items.

        I would say that attaching debug metadata to the GPU VMA doesn't look 
like the best design, but if you just do that inside XE it won't affect 
any other part of the kernel.

      It just grew out of convenience of implementation on the side of VM_BIND.

The other alternative would be to maintain a secondary load map in the
kernel in a separate data structure from GPU VMA.

I was actually going to suggest such thing as a common DRM thing: GPU VMA
metadata interface or parallel "GPU loadmap" interface. It'd allow for
userspace tooling to more easier go from GPU EU IP to a module that
was loaded at that address. Kind of a step 0 towards backtrace for GPU.

Can you elaborate on what your concern is with the VMA metadata
attachment?

    In general we should try to avoid putting data into the kernel the
    kernel doesn't need.

    In other words we don't put the debug metadata for the CPU process
    on the CPU VMA either.

    You could reduce the amount of data massively if you just attach the
    source of the debug metadata to the GPU VMA.

    E.g. instead of the symbol table just where to find it. A VA of the
    CPU process, a file system location or something like that.

        My other concern I have is using ptrace_may_access, I would still try to 
avoid that.

What if you first grab the DRM render node file descriptor which 
represents the GPU address space you want to debug with pidfd_getfd() 
and then either create the eudebug file descriptor from an IOCTL or 
implement the necessary IOCTLs on the DRM render node directly?

That would make it unnecessary to export ptrace_may_access.

      We're prototyping this. At this point there are some caveats recognized:

1. There is a limitation that you don't get a notification when your
target PID opens a DRM client, but GDB or other application would have
to keep polling for the FDs. I'll have to check with the team how that
would fit to GDB side.

    Well there is the dnotify/inotify API which allows a process to be
    notified of certain filesystem events.

    I never used it, but it potentially provides an event when a
    specific file is open.

    On the other hand I have no idea what permissions are necessary to
    use it, potentially root.

      2. Debugging multiple DRM clients (to same GPU) under one PID now
requires separate debugger connections. This may break the way the debugger
locking is currently implemented for the discovery phase to prevent parallel
IOCTL from running. Will have to look into it once we have a working
prototype.

    Well multiple DRM clients would also have multiple GPU VM address
    spaces, wouldn't they?

    So you would need multiple connections, one for each DRM client as
    well.

      3. Last but not the least, we'll have to compare which LSM security
modules and other conditions checked on the pidfd_getfd() paths for
access restrictions.

Reason for using ptrace_may_access() was to have a clear 1:1 mapping
between user being allowed to ptrace() a PID to control CPU threads and
do debugger IOCTL to control GPU threads. So if user can attach to a PID
with GDB, they would certainly also be able to debug the GPU portion.

If there is divergence, I don't see a big benefit in going to
pidfd_getfd(). We're all the same not even exporting ptrace_may_access()
and YOLO'ing the access check by comparing euid and such which is close
to what ptrace does, but not exactly the same (just like pidfd_getfd()
access checks).

    Well that argumentation is exactly backward to why I suggested not
    using ptrace_may_access().

    When an administrator used LSM to block pidfd_getfd() then a driver
    which uses a driver specific IOCTL to bypass that is actually a
    really bad idea.

      However I believe this would not be a fundamental blocker for the
series? If this would be the only remaining disagreement, I guess we
could just do CAP_ADMIN check initially before we can find something
agreeable.

    Well as soon as anything is merged upstream it becomes UAPI, which
    in turn means that changing it fundamentally becomes really hard to
    do.

    What you could do is to expose the interface through debugfs and
    explicitly state that it isn't stable in any way.

    Regards,

    Christian.

Regards, Joonas

        Regards,
Christian.

          Regards, Joonas

            Regards,
Christian.