On 6/29/22 00:26, Thomas Hellström (Intel) wrote: > > On 5/30/22 15:57, Dmitry Osipenko wrote: >> On 5/30/22 16:41, Christian König wrote: >>> Hi Dmitry, >>> >>> Am 30.05.22 um 15:26 schrieb Dmitry Osipenko: >>>> Hello Christian, >>>> >>>> On 5/30/22 09:50, Christian König wrote: >>>>> Hi Dmitry, >>>>> >>>>> First of all please separate out this patch from the rest of the >>>>> series, >>>>> since this is a complex separate structural change. >>>> I assume all the patches will go via the DRM tree in the end since the >>>> rest of the DRM patches in this series depend on this dma-buf change. >>>> But I see that separation may ease reviewing of the dma-buf changes, so >>>> let's try it. >>> That sounds like you are underestimating a bit how much trouble this >>> will be. >>> >>>>> I have tried this before and failed because catching all the locks in >>>>> the right code paths are very tricky. So expect some fallout from this >>>>> and make sure the kernel test robot and CI systems are clean. >>>> Sure, I'll fix up all the reported things in the next iteration. >>>> >>>> BTW, have you ever posted yours version of the patch? Will be great if >>>> we could compare the changed code paths. >>> No, I never even finished creating it after realizing how much work it >>> would be. >>> >>>>>> This patch introduces new locking convention for dma-buf users. From >>>>>> now >>>>>> on all dma-buf importers are responsible for holding dma-buf >>>>>> reservation >>>>>> lock around operations performed over dma-bufs. >>>>>> >>>>>> This patch implements the new dma-buf locking convention by: >>>>>> >>>>>> 1. Making dma-buf API functions to take the reservation lock. >>>>>> >>>>>> 2. Adding new locked variants of the dma-buf API functions for >>>>>> drivers >>>>>> that need to manage imported dma-bufs under the held lock. >>>>> Instead of adding new locked variants please mark all variants which >>>>> expect to be called without a lock with an _unlocked postfix. >>>>> >>>>> This should make it easier to remove those in a follow up patch set >>>>> and >>>>> then fully move the locking into the importer. >>>> Do we really want to move all the locks to the importers? Seems the >>>> majority of drivers should be happy with the dma-buf helpers handling >>>> the locking for them. >>> Yes, I clearly think so. >>> >>>>>> 3. Converting all drivers to the new locking scheme. >>>>> I have strong doubts that you got all of them. At least radeon and >>>>> nouveau should grab the reservation lock in their ->attach callbacks >>>>> somehow. >>>> Radeon and Nouveau use gem_prime_import_sg_table() and they take resv >>>> lock already, seems they should be okay (?) >>> You are looking at the wrong side. You need to fix the export code path, >>> not the import ones. >>> >>> See for example attach on radeon works like this >>> drm_gem_map_attach->drm_gem_pin->radeon_gem_prime_pin->radeon_bo_reserve->ttm_bo_reserve->dma_resv_lock. >>> >> Yeah, I was looking at the both sides, but missed this one. > > Also i915 will run into trouble with attach. In particular since i915 > starts a full ww transaction in its attach callback to be able to lock > other objects if migration is needed. I think i915 CI would catch this > in a selftest. Seems it indeed it should deadlock. But i915 selftests apparently should've caught it and they didn't, I'll re-check what happened. > Perhaps it's worthwile to take a step back and figure out, if the > importer is required to lock, which callbacks might need a ww acquire > context? I'll take this into account, thanks. > (And off-topic, Since we do a lot of fancy stuff under dma-resv locks > including waiting for fences and other locks, IMO taking these locks > uninterruptible should ring a warning bell) I had the same thought and had a version that used the interruptible locking variant, but then decided to fall back to the uninterruptible, don't remember why. I'll revisit this. -- Best regards, Dmitry