Re: [PATCH v3 0/9] AMDGPU Usermode queues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




On 11/04/2023 12:00, Bas Nieuwenhuizen wrote:
On Tue, Apr 11, 2023 at 11:48 AM Shashank Sharma
<shashank.sharma@xxxxxxx> wrote:

On 11/04/2023 11:37, Christian König wrote:
Am 10.04.23 um 16:26 schrieb Shashank Sharma:
On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma
<shashank.sharma@xxxxxxx> wrote:
On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
<Shashank.Sharma@xxxxxxx> wrote:
[AMD Official Use Only - General]

Hello Bas,

This is not the correct interpretation of the code, the
USERQ_IOCTL has both the OPs (create and destroy), but th euser
has to exclusively call  it.

Please see the sample test program in the existing libDRM series
(userq_test.c, it specifically calls amdgpu_free_userq, which
does the destroy_OP

for the IOCTL.
In the presence of crashes the kernel should always be able to clean
this up no? Otherwise there is a resource leak?
The crash handling is the same as any of the existing GPU resource
which
are allocated and freed with IOCTL_OPs.
Most of those are handled in the when the DRM fd gets closed (i.e.
when the process exits):

- buffers through drm_gem_release()
- mappings in amdgpu_vm_fini
- contexts in amdgpu_ctx_mgr_fini

etc.

Why would we do things differently for userspace queues? It doesn't
look complicated looking at the above patch (which does seem to work).
As the code is in initial stage, I have not given much thoughts about
handling resource leak due to app crash, but this seems like a good
suggestion.

I am taking a note and will try to accommodate this in an upcoming
version of the series.
Bas is right, the application doesn't necessary needs to clean up on
exit (but it's still good custody to do so).

See amdgpu_driver_postclose_kms() for how we cleanup (for example) the
ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.

Thanks for the pointers Christian,

I also feel like that its good to have this cleanup for those apps which
did not clean-up themselves (due to crash or coding error).
I think the patch I linked earlier does exactly this: keep the IOCTL,
but on fini goes through the list and destroys the queue:
https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29

Yep, just needs additional check to free only when its not already freed, like doble free check. Will try to reuse most of it.

- Shashank

So something like,

on close_fd,

for_idr_each {

     get_queue()

     if (queue)

         free_queue

}

But we will also keep the queue_free_OP as well, so that if an app
allocate multiple queues, and wants to free some in between, it can do it.

- Shashank

Regards,
Christian.

- Shashank

To be honest a crash handling can be very elaborated and complex one,
and hence only can be done at the driver unload IMO, which doesn't
help
at that stage,

coz anyways driver will re-allocate the resources on next load.

- Shashank

- Shashank

-----Original Message-----
From: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
Sent: 10 April 2023 11:26
To: Sharma, Shashank <Shashank.Sharma@xxxxxxx>
Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander
<Alexander.Deucher@xxxxxxx>; Kuehling, Felix
<Felix.Kuehling@xxxxxxx>; Koenig, Christian
<Christian.Koenig@xxxxxxx>; Yadav, Arvind <Arvind.Yadav@xxxxxxx>
Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues

Hi Shashank,

I think I found the issue: I wasn't destroying the user queue in
my program and the kernel doesn't clean up any remaining user
queues in the postclose hook. I think we need something like
https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29

?

While running things multiple times now works, I still have
problems doing multiple submissions from the same queue. Looking
forward to the updated test/sample

Thanks,
Bas

On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
<Shashank.Sharma@xxxxxxx> wrote:
[AMD Official Use Only - General]

Hello Bas,
Thanks for trying this out.

This could be due to the doorbell as you mentioned, Usermode
queue uses doorbell manager internally.
This week, we are planning to publis the latest libDRM sample
code which uses a doorbell object (instead of the doorbell hack
IOCTL), adapting to that should fix your problem in my opinion.
We have tested this full stack (libDRM test + Usermode queue +
doorbell manager) for 500+ consecutive runs, and it worked well
for us.

You can use this integrated kernel stack (1+2) from my gitlab to
build
your kernel:
https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr

ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
changes with doorbell objects.

Regards
Shashank
-----Original Message-----
From: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
Sent: 10 April 2023 02:37
To: Sharma, Shashank <Shashank.Sharma@xxxxxxx>
Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander
<Alexander.Deucher@xxxxxxx>; Kuehling, Felix
<Felix.Kuehling@xxxxxxx>;
Koenig, Christian <Christian.Koenig@xxxxxxx>
Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues

Hi Shashank,

I tried writing a program to experiment with usermode queues and
I found some weird behavior: The first run of the program works
as expected, while subsequent runs don't seem to do anything
(and I allocate everything in GTT, so it should be zero
initialized consistently). Is this a known issue?

The linked libdrm code for the uapi still does a doorbell ioctl
so it could very well be that I do the doorbell wrong
(especially since the ioctl implementation was never shared
AFAICT?), but it seems like the kernel submissions (i.e. write
wptr in dwords to the wptr va and to the doorbell). Is it
possible to update the test in libdrm?

Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue

Thanks,
Bas

On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma
<shashank.sharma@xxxxxxx> wrote:
This patch series introduces AMDGPU usermode queues for gfx
workloads.
Usermode queues is a method of GPU workload submission into the
graphics hardware without any interaction with kernel/DRM
schedulers.
In this method, a userspace graphics application can create its
own
workqueue and submit it directly in the GPU HW.

The general idea of how this is supposed to work:
- The application creates the following GPU objetcs:
     - A queue object to hold the workload packets.
     - A read pointer object.
     - A write pointer object.
     - A doorbell page.
- The application picks a 32-bit offset in the doorbell page
for this queue.
- The application uses the usermode_queue_create IOCTL
introduced in
     this patch, by passing the the GPU addresses of these
objects (read
     ptr, write ptr, queue base address and 32-bit doorbell
offset from
     the doorbell page)
- The kernel creates the queue and maps it in the HW.
- The application can start submitting the data in the queue as
soon as
     the kernel IOCTL returns.
- After filling the workload data in the queue, the app must
write the
     number of dwords added in the queue into the doorbell
offset, and the
     GPU will start fetching the data.

libDRM changes for this series and a sample DRM test program
can be
found in the MESA merge request here:
https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287

This patch series depends on the doorbell-manager changes,
which are
being reviewed here:
https://patchwork.freedesktop.org/series/115802/

Alex Deucher (1):
     drm/amdgpu: UAPI for user queue management

Arvind Yadav (2):
     drm/amdgpu: add new parameters in v11_struct
     drm/amdgpu: map wptr BO into GART

Shashank Sharma (6):
     drm/amdgpu: add usermode queue base code
     drm/amdgpu: add new IOCTL for usermode queue
     drm/amdgpu: create GFX-gen11 MQD for userqueue
     drm/amdgpu: create context space for usermode queue
     drm/amdgpu: map usermode queue into MES
     drm/amdgpu: generate doorbell index for userqueue

    drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
    drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
++++++++++++++++++ 230 ++++++++++++++
    .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
    drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
    include/uapi/drm/amdgpu_drm.h                 | 55 ++++
    9 files changed, 677 insertions(+), 9 deletions(-) create mode
100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
    create mode 100644
drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
    create mode 100644
drivers/gpu/drm/amd/include/amdgpu_userqueue.h

--
2.40.0




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux