Re: [PATCH v3 0/9] AMDGPU Usermode queues

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Apr 11, 2023 at 11:48 AM Shashank Sharma
<shashank.sharma@xxxxxxx> wrote:
>
>
> On 11/04/2023 11:37, Christian König wrote:
> > Am 10.04.23 um 16:26 schrieb Shashank Sharma:
> >>
> >> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote:
> >>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma
> >>> <shashank.sharma@xxxxxxx> wrote:
> >>>>
> >>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote:
> >>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank
> >>>>> <Shashank.Sharma@xxxxxxx> wrote:
> >>>>>> [AMD Official Use Only - General]
> >>>>>>
> >>>>>> Hello Bas,
> >>>>>>
> >>>>>> This is not the correct interpretation of the code, the
> >>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser
> >>>>>> has to exclusively call  it.
> >>>>>>
> >>>>>> Please see the sample test program in the existing libDRM series
> >>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which
> >>>>>> does the destroy_OP
> >>>>>>
> >>>>>> for the IOCTL.
> >>>>> In the presence of crashes the kernel should always be able to clean
> >>>>> this up no? Otherwise there is a resource leak?
> >>>> The crash handling is the same as any of the existing GPU resource
> >>>> which
> >>>> are allocated and freed with IOCTL_OPs.
> >>> Most of those are handled in the when the DRM fd gets closed (i.e.
> >>> when the process exits):
> >>>
> >>> - buffers through drm_gem_release()
> >>> - mappings in amdgpu_vm_fini
> >>> - contexts in amdgpu_ctx_mgr_fini
> >>>
> >>> etc.
> >>>
> >>> Why would we do things differently for userspace queues? It doesn't
> >>> look complicated looking at the above patch (which does seem to work).
> >>
> >> As the code is in initial stage, I have not given much thoughts about
> >> handling resource leak due to app crash, but this seems like a good
> >> suggestion.
> >>
> >> I am taking a note and will try to accommodate this in an upcoming
> >> version of the series.
> >
> > Bas is right, the application doesn't necessary needs to clean up on
> > exit (but it's still good custody to do so).
> >
> > See amdgpu_driver_postclose_kms() for how we cleanup (for example) the
> > ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists.
> >
> Thanks for the pointers Christian,
>
> I also feel like that its good to have this cleanup for those apps which
> did not clean-up themselves (due to crash or coding error).

I think the patch I linked earlier does exactly this: keep the IOCTL,
but on fini goes through the list and destroys the queue:
https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
>
> So something like,
>
> on close_fd,
>
> for_idr_each {
>
>     get_queue()
>
>     if (queue)
>
>         free_queue
>
> }
>
> But we will also keep the queue_free_OP as well, so that if an app
> allocate multiple queues, and wants to free some in between, it can do it.
>
> - Shashank
>
> > Regards,
> > Christian.
> >
> >>
> >> - Shashank
> >>
> >>>> To be honest a crash handling can be very elaborated and complex one,
> >>>> and hence only can be done at the driver unload IMO, which doesn't
> >>>> help
> >>>> at that stage,
> >>>>
> >>>> coz anyways driver will re-allocate the resources on next load.
> >>>>
> >>>> - Shashank
> >>>>
> >>>>>> - Shashank
> >>>>>>
> >>>>>> -----Original Message-----
> >>>>>> From: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
> >>>>>> Sent: 10 April 2023 11:26
> >>>>>> To: Sharma, Shashank <Shashank.Sharma@xxxxxxx>
> >>>>>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander
> >>>>>> <Alexander.Deucher@xxxxxxx>; Kuehling, Felix
> >>>>>> <Felix.Kuehling@xxxxxxx>; Koenig, Christian
> >>>>>> <Christian.Koenig@xxxxxxx>; Yadav, Arvind <Arvind.Yadav@xxxxxxx>
> >>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >>>>>>
> >>>>>> Hi Shashank,
> >>>>>>
> >>>>>> I think I found the issue: I wasn't destroying the user queue in
> >>>>>> my program and the kernel doesn't clean up any remaining user
> >>>>>> queues in the postclose hook. I think we need something like
> >>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29
> >>>>>>
> >>>>>> ?
> >>>>>>
> >>>>>> While running things multiple times now works, I still have
> >>>>>> problems doing multiple submissions from the same queue. Looking
> >>>>>> forward to the updated test/sample
> >>>>>>
> >>>>>> Thanks,
> >>>>>> Bas
> >>>>>>
> >>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank
> >>>>>> <Shashank.Sharma@xxxxxxx> wrote:
> >>>>>>> [AMD Official Use Only - General]
> >>>>>>>
> >>>>>>> Hello Bas,
> >>>>>>> Thanks for trying this out.
> >>>>>>>
> >>>>>>> This could be due to the doorbell as you mentioned, Usermode
> >>>>>>> queue uses doorbell manager internally.
> >>>>>>> This week, we are planning to publis the latest libDRM sample
> >>>>>>> code which uses a doorbell object (instead of the doorbell hack
> >>>>>>> IOCTL), adapting to that should fix your problem in my opinion.
> >>>>>>> We have tested this full stack (libDRM test + Usermode queue +
> >>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well
> >>>>>>> for us.
> >>>>>>>
> >>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to
> >>>>>>> build
> >>>>>>> your kernel:
> >>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr
> >>>>>>>
> >>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM
> >>>>>>> changes with doorbell objects.
> >>>>>>>
> >>>>>>> Regards
> >>>>>>> Shashank
> >>>>>>> -----Original Message-----
> >>>>>>> From: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx>
> >>>>>>> Sent: 10 April 2023 02:37
> >>>>>>> To: Sharma, Shashank <Shashank.Sharma@xxxxxxx>
> >>>>>>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander
> >>>>>>> <Alexander.Deucher@xxxxxxx>; Kuehling, Felix
> >>>>>>> <Felix.Kuehling@xxxxxxx>;
> >>>>>>> Koenig, Christian <Christian.Koenig@xxxxxxx>
> >>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues
> >>>>>>>
> >>>>>>> Hi Shashank,
> >>>>>>>
> >>>>>>> I tried writing a program to experiment with usermode queues and
> >>>>>>> I found some weird behavior: The first run of the program works
> >>>>>>> as expected, while subsequent runs don't seem to do anything
> >>>>>>> (and I allocate everything in GTT, so it should be zero
> >>>>>>> initialized consistently). Is this a known issue?
> >>>>>>>
> >>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl
> >>>>>>> so it could very well be that I do the doorbell wrong
> >>>>>>> (especially since the ioctl implementation was never shared
> >>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write
> >>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it
> >>>>>>> possible to update the test in libdrm?
> >>>>>>>
> >>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue
> >>>>>>>
> >>>>>>> Thanks,
> >>>>>>> Bas
> >>>>>>>
> >>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma
> >>>>>>> <shashank.sharma@xxxxxxx> wrote:
> >>>>>>>> This patch series introduces AMDGPU usermode queues for gfx
> >>>>>>>> workloads.
> >>>>>>>> Usermode queues is a method of GPU workload submission into the
> >>>>>>>> graphics hardware without any interaction with kernel/DRM
> >>>>>>>> schedulers.
> >>>>>>>> In this method, a userspace graphics application can create its
> >>>>>>>> own
> >>>>>>>> workqueue and submit it directly in the GPU HW.
> >>>>>>>>
> >>>>>>>> The general idea of how this is supposed to work:
> >>>>>>>> - The application creates the following GPU objetcs:
> >>>>>>>>     - A queue object to hold the workload packets.
> >>>>>>>>     - A read pointer object.
> >>>>>>>>     - A write pointer object.
> >>>>>>>>     - A doorbell page.
> >>>>>>>> - The application picks a 32-bit offset in the doorbell page
> >>>>>>>> for this queue.
> >>>>>>>> - The application uses the usermode_queue_create IOCTL
> >>>>>>>> introduced in
> >>>>>>>>     this patch, by passing the the GPU addresses of these
> >>>>>>>> objects (read
> >>>>>>>>     ptr, write ptr, queue base address and 32-bit doorbell
> >>>>>>>> offset from
> >>>>>>>>     the doorbell page)
> >>>>>>>> - The kernel creates the queue and maps it in the HW.
> >>>>>>>> - The application can start submitting the data in the queue as
> >>>>>>>> soon as
> >>>>>>>>     the kernel IOCTL returns.
> >>>>>>>> - After filling the workload data in the queue, the app must
> >>>>>>>> write the
> >>>>>>>>     number of dwords added in the queue into the doorbell
> >>>>>>>> offset, and the
> >>>>>>>>     GPU will start fetching the data.
> >>>>>>>>
> >>>>>>>> libDRM changes for this series and a sample DRM test program
> >>>>>>>> can be
> >>>>>>>> found in the MESA merge request here:
> >>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287
> >>>>>>>>
> >>>>>>>> This patch series depends on the doorbell-manager changes,
> >>>>>>>> which are
> >>>>>>>> being reviewed here:
> >>>>>>>> https://patchwork.freedesktop.org/series/115802/
> >>>>>>>>
> >>>>>>>> Alex Deucher (1):
> >>>>>>>>     drm/amdgpu: UAPI for user queue management
> >>>>>>>>
> >>>>>>>> Arvind Yadav (2):
> >>>>>>>>     drm/amdgpu: add new parameters in v11_struct
> >>>>>>>>     drm/amdgpu: map wptr BO into GART
> >>>>>>>>
> >>>>>>>> Shashank Sharma (6):
> >>>>>>>>     drm/amdgpu: add usermode queue base code
> >>>>>>>>     drm/amdgpu: add new IOCTL for usermode queue
> >>>>>>>>     drm/amdgpu: create GFX-gen11 MQD for userqueue
> >>>>>>>>     drm/amdgpu: create context space for usermode queue
> >>>>>>>>     drm/amdgpu: map usermode queue into MES
> >>>>>>>>     drm/amdgpu: generate doorbell index for userqueue
> >>>>>>>>
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/Makefile           | 3 +
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu.h           | 10 +-
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c       | 2 +
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c       | 6 +
> >>>>>>>>    drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298
> >>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c |
> >>>>>>>> ++++++++++++++++++ 230 ++++++++++++++
> >>>>>>>>    .../gpu/drm/amd/include/amdgpu_userqueue.h    | 66 ++++
> >>>>>>>>    drivers/gpu/drm/amd/include/v11_structs.h     | 16 +-
> >>>>>>>>    include/uapi/drm/amdgpu_drm.h                 | 55 ++++
> >>>>>>>>    9 files changed, 677 insertions(+), 9 deletions(-) create mode
> >>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c
> >>>>>>>>    create mode 100644
> >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c
> >>>>>>>>    create mode 100644
> >>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h
> >>>>>>>>
> >>>>>>>> --
> >>>>>>>> 2.40.0
> >>>>>>>>
> >




[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux