On Tue, Apr 11, 2023 at 11:48 AM Shashank Sharma <shashank.sharma@xxxxxxx> wrote: > > > On 11/04/2023 11:37, Christian König wrote: > > Am 10.04.23 um 16:26 schrieb Shashank Sharma: > >> > >> On 10/04/2023 16:04, Bas Nieuwenhuizen wrote: > >>> On Mon, Apr 10, 2023 at 4:01 PM Shashank Sharma > >>> <shashank.sharma@xxxxxxx> wrote: > >>>> > >>>> On 10/04/2023 15:46, Bas Nieuwenhuizen wrote: > >>>>> On Mon, Apr 10, 2023 at 3:40 PM Sharma, Shashank > >>>>> <Shashank.Sharma@xxxxxxx> wrote: > >>>>>> [AMD Official Use Only - General] > >>>>>> > >>>>>> Hello Bas, > >>>>>> > >>>>>> This is not the correct interpretation of the code, the > >>>>>> USERQ_IOCTL has both the OPs (create and destroy), but th euser > >>>>>> has to exclusively call it. > >>>>>> > >>>>>> Please see the sample test program in the existing libDRM series > >>>>>> (userq_test.c, it specifically calls amdgpu_free_userq, which > >>>>>> does the destroy_OP > >>>>>> > >>>>>> for the IOCTL. > >>>>> In the presence of crashes the kernel should always be able to clean > >>>>> this up no? Otherwise there is a resource leak? > >>>> The crash handling is the same as any of the existing GPU resource > >>>> which > >>>> are allocated and freed with IOCTL_OPs. > >>> Most of those are handled in the when the DRM fd gets closed (i.e. > >>> when the process exits): > >>> > >>> - buffers through drm_gem_release() > >>> - mappings in amdgpu_vm_fini > >>> - contexts in amdgpu_ctx_mgr_fini > >>> > >>> etc. > >>> > >>> Why would we do things differently for userspace queues? It doesn't > >>> look complicated looking at the above patch (which does seem to work). > >> > >> As the code is in initial stage, I have not given much thoughts about > >> handling resource leak due to app crash, but this seems like a good > >> suggestion. > >> > >> I am taking a note and will try to accommodate this in an upcoming > >> version of the series. > > > > Bas is right, the application doesn't necessary needs to clean up on > > exit (but it's still good custody to do so). > > > > See amdgpu_driver_postclose_kms() for how we cleanup (for example) the > > ctx manager by calling amdgpu_ctx_mgr_fini() or the BO lists. > > > Thanks for the pointers Christian, > > I also feel like that its good to have this cleanup for those apps which > did not clean-up themselves (due to crash or coding error). I think the patch I linked earlier does exactly this: keep the IOCTL, but on fini goes through the list and destroys the queue: https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29 > > So something like, > > on close_fd, > > for_idr_each { > > get_queue() > > if (queue) > > free_queue > > } > > But we will also keep the queue_free_OP as well, so that if an app > allocate multiple queues, and wants to free some in between, it can do it. > > - Shashank > > > Regards, > > Christian. > > > >> > >> - Shashank > >> > >>>> To be honest a crash handling can be very elaborated and complex one, > >>>> and hence only can be done at the driver unload IMO, which doesn't > >>>> help > >>>> at that stage, > >>>> > >>>> coz anyways driver will re-allocate the resources on next load. > >>>> > >>>> - Shashank > >>>> > >>>>>> - Shashank > >>>>>> > >>>>>> -----Original Message----- > >>>>>> From: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> > >>>>>> Sent: 10 April 2023 11:26 > >>>>>> To: Sharma, Shashank <Shashank.Sharma@xxxxxxx> > >>>>>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > >>>>>> <Alexander.Deucher@xxxxxxx>; Kuehling, Felix > >>>>>> <Felix.Kuehling@xxxxxxx>; Koenig, Christian > >>>>>> <Christian.Koenig@xxxxxxx>; Yadav, Arvind <Arvind.Yadav@xxxxxxx> > >>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues > >>>>>> > >>>>>> Hi Shashank, > >>>>>> > >>>>>> I think I found the issue: I wasn't destroying the user queue in > >>>>>> my program and the kernel doesn't clean up any remaining user > >>>>>> queues in the postclose hook. I think we need something like > >>>>>> https://github.com/BNieuwenhuizen/linux/commit/e90c8d1185da7353c12837973ceddf55ccc85d29 > >>>>>> > >>>>>> ? > >>>>>> > >>>>>> While running things multiple times now works, I still have > >>>>>> problems doing multiple submissions from the same queue. Looking > >>>>>> forward to the updated test/sample > >>>>>> > >>>>>> Thanks, > >>>>>> Bas > >>>>>> > >>>>>> On Mon, Apr 10, 2023 at 9:32 AM Sharma, Shashank > >>>>>> <Shashank.Sharma@xxxxxxx> wrote: > >>>>>>> [AMD Official Use Only - General] > >>>>>>> > >>>>>>> Hello Bas, > >>>>>>> Thanks for trying this out. > >>>>>>> > >>>>>>> This could be due to the doorbell as you mentioned, Usermode > >>>>>>> queue uses doorbell manager internally. > >>>>>>> This week, we are planning to publis the latest libDRM sample > >>>>>>> code which uses a doorbell object (instead of the doorbell hack > >>>>>>> IOCTL), adapting to that should fix your problem in my opinion. > >>>>>>> We have tested this full stack (libDRM test + Usermode queue + > >>>>>>> doorbell manager) for 500+ consecutive runs, and it worked well > >>>>>>> for us. > >>>>>>> > >>>>>>> You can use this integrated kernel stack (1+2) from my gitlab to > >>>>>>> build > >>>>>>> your kernel: > >>>>>>> https://gitlab.freedesktop.org/contactshashanksharma/userq-amdgpu/-/tr > >>>>>>> > >>>>>>> ee/integrated-db-and-uq-v3 Please stay tuned for updated libDRM > >>>>>>> changes with doorbell objects. > >>>>>>> > >>>>>>> Regards > >>>>>>> Shashank > >>>>>>> -----Original Message----- > >>>>>>> From: Bas Nieuwenhuizen <bas@xxxxxxxxxxxxxxxxxxx> > >>>>>>> Sent: 10 April 2023 02:37 > >>>>>>> To: Sharma, Shashank <Shashank.Sharma@xxxxxxx> > >>>>>>> Cc: amd-gfx@xxxxxxxxxxxxxxxxxxxxx; Deucher, Alexander > >>>>>>> <Alexander.Deucher@xxxxxxx>; Kuehling, Felix > >>>>>>> <Felix.Kuehling@xxxxxxx>; > >>>>>>> Koenig, Christian <Christian.Koenig@xxxxxxx> > >>>>>>> Subject: Re: [PATCH v3 0/9] AMDGPU Usermode queues > >>>>>>> > >>>>>>> Hi Shashank, > >>>>>>> > >>>>>>> I tried writing a program to experiment with usermode queues and > >>>>>>> I found some weird behavior: The first run of the program works > >>>>>>> as expected, while subsequent runs don't seem to do anything > >>>>>>> (and I allocate everything in GTT, so it should be zero > >>>>>>> initialized consistently). Is this a known issue? > >>>>>>> > >>>>>>> The linked libdrm code for the uapi still does a doorbell ioctl > >>>>>>> so it could very well be that I do the doorbell wrong > >>>>>>> (especially since the ioctl implementation was never shared > >>>>>>> AFAICT?), but it seems like the kernel submissions (i.e. write > >>>>>>> wptr in dwords to the wptr va and to the doorbell). Is it > >>>>>>> possible to update the test in libdrm? > >>>>>>> > >>>>>>> Code: https://gitlab.freedesktop.org/bnieuwenhuizen/usermode-queue > >>>>>>> > >>>>>>> Thanks, > >>>>>>> Bas > >>>>>>> > >>>>>>> On Wed, Mar 29, 2023 at 6:05 PM Shashank Sharma > >>>>>>> <shashank.sharma@xxxxxxx> wrote: > >>>>>>>> This patch series introduces AMDGPU usermode queues for gfx > >>>>>>>> workloads. > >>>>>>>> Usermode queues is a method of GPU workload submission into the > >>>>>>>> graphics hardware without any interaction with kernel/DRM > >>>>>>>> schedulers. > >>>>>>>> In this method, a userspace graphics application can create its > >>>>>>>> own > >>>>>>>> workqueue and submit it directly in the GPU HW. > >>>>>>>> > >>>>>>>> The general idea of how this is supposed to work: > >>>>>>>> - The application creates the following GPU objetcs: > >>>>>>>> - A queue object to hold the workload packets. > >>>>>>>> - A read pointer object. > >>>>>>>> - A write pointer object. > >>>>>>>> - A doorbell page. > >>>>>>>> - The application picks a 32-bit offset in the doorbell page > >>>>>>>> for this queue. > >>>>>>>> - The application uses the usermode_queue_create IOCTL > >>>>>>>> introduced in > >>>>>>>> this patch, by passing the the GPU addresses of these > >>>>>>>> objects (read > >>>>>>>> ptr, write ptr, queue base address and 32-bit doorbell > >>>>>>>> offset from > >>>>>>>> the doorbell page) > >>>>>>>> - The kernel creates the queue and maps it in the HW. > >>>>>>>> - The application can start submitting the data in the queue as > >>>>>>>> soon as > >>>>>>>> the kernel IOCTL returns. > >>>>>>>> - After filling the workload data in the queue, the app must > >>>>>>>> write the > >>>>>>>> number of dwords added in the queue into the doorbell > >>>>>>>> offset, and the > >>>>>>>> GPU will start fetching the data. > >>>>>>>> > >>>>>>>> libDRM changes for this series and a sample DRM test program > >>>>>>>> can be > >>>>>>>> found in the MESA merge request here: > >>>>>>>> https://gitlab.freedesktop.org/mesa/drm/-/merge_requests/287 > >>>>>>>> > >>>>>>>> This patch series depends on the doorbell-manager changes, > >>>>>>>> which are > >>>>>>>> being reviewed here: > >>>>>>>> https://patchwork.freedesktop.org/series/115802/ > >>>>>>>> > >>>>>>>> Alex Deucher (1): > >>>>>>>> drm/amdgpu: UAPI for user queue management > >>>>>>>> > >>>>>>>> Arvind Yadav (2): > >>>>>>>> drm/amdgpu: add new parameters in v11_struct > >>>>>>>> drm/amdgpu: map wptr BO into GART > >>>>>>>> > >>>>>>>> Shashank Sharma (6): > >>>>>>>> drm/amdgpu: add usermode queue base code > >>>>>>>> drm/amdgpu: add new IOCTL for usermode queue > >>>>>>>> drm/amdgpu: create GFX-gen11 MQD for userqueue > >>>>>>>> drm/amdgpu: create context space for usermode queue > >>>>>>>> drm/amdgpu: map usermode queue into MES > >>>>>>>> drm/amdgpu: generate doorbell index for userqueue > >>>>>>>> > >>>>>>>> drivers/gpu/drm/amd/amdgpu/Makefile | 3 + > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu.h | 10 +- > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_drv.c | 2 + > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_kms.c | 6 + > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c | 298 > >>>>>>>> ++++++++++++++++++ .../drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c | > >>>>>>>> ++++++++++++++++++ 230 ++++++++++++++ > >>>>>>>> .../gpu/drm/amd/include/amdgpu_userqueue.h | 66 ++++ > >>>>>>>> drivers/gpu/drm/amd/include/v11_structs.h | 16 +- > >>>>>>>> include/uapi/drm/amdgpu_drm.h | 55 ++++ > >>>>>>>> 9 files changed, 677 insertions(+), 9 deletions(-) create mode > >>>>>>>> 100644 drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue.c > >>>>>>>> create mode 100644 > >>>>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_userqueue_gfx_v11.c > >>>>>>>> create mode 100644 > >>>>>>>> drivers/gpu/drm/amd/include/amdgpu_userqueue.h > >>>>>>>> > >>>>>>>> -- > >>>>>>>> 2.40.0 > >>>>>>>> > >