On Tue, Mar 9, 2021 at 8:13 PM Andrey Grodzovsky <andrey.grodzovsky@xxxxxxx> wrote: > > On 2021-03-09 1:51 p.m., Christian König wrote: > > Am 09.03.21 um 19:26 schrieb Andrey Grodzovsky: > >> On 2021-03-09 12:47 p.m., Christian König wrote: > >>> No it won't. Accessing the hardware without the lock is ok as long as > >>> the write side isn't taken. > >> > >> Oh, forgot about the trylock part, sorry... > >> > >>> > >>> But that approach is illegal anyway because we suspend the hardware > >>> without proper protection from concurrent access. > >> > >> For my understanding and from looking again at his steps related to this > >> > >> Step 0: atomic_cmpxchg(&adev->in_gpu_reset, 0, 1) - [AG] protects from > >> other TDR threads > >> > >> Step 1: cancel all delay works, stop drm schedule, complete all > >> unreceived fences and so on. Call amdgpu_device_pre_asic_reset... e.t.c > >> - [AG] this is the HW suspend part > >> > >> Step 2: call down_write(&adev->reset_sem) to hold write lock, which will > >> block recovery thread until other threads release read locks. > >> > >> Is the problem here that HW is suspended while some other threads > >> that rely on the read side lock still access HW ? > > > > Exactly that, yes. > > > >> Mostly what I am > >> thinking about are IOCTls - we can't 'wait for them to complete' but > >> they might be accessing HW when we start suspend. > > > > Well, the bigger problem with the IOCTLs is that I can't seem to be able > > to properly explain the dependency here. > > > > Maybe let me describe it from the other side once more to make it more > > understandable: > > > > Core Linux memory management depends on waiting for dma_fences. Amdgpus > > dma_fences in turn depend on the GPU scheduler and recovery/reset > > functionality for proper functionality. > > > > So every call to kmalloc() or get_free_page() depends on the amdgpu GPU > > recover/reset function and with it every lock taken in that function. > > > > Because of this you can't take any lock under which memory allocation > > might happen while holding the reset lock. > > > > This is a hard and by now very well documented limitation and Dennis > > changes here doesn't magically remove that. > > > > Because of this taking the reset lock or any other lock used in the GPU > > reset functionality at the beginning of our IOCTLs will never work > > correctly. > > > > What you need to do is to have all locks which depend on MM taken before > > your take the reset lock. E.g. dma_resv, mmap_sem, VM locks etc etc.. As > > well as not call functions like kmalloc, get_free_page, dma_fence_wait > > etc etc > > > > I hope that somehow makes it more clear now. We should probably try to > > merge Daniels patch set for teaching lockdep to complain about such > > things sooner or later. > > It is now more clear then in the previous discussion. One > thing that would be helpful is also to see the kernel documentation > about this limitation - can you maybe point in kernel DOC or even in > code to such documentation ? For the annotations: https://dri.freedesktop.org/docs/drm/driver-api/dma-buf.html?highlight=dma_fence#dma-fence-signalling-annotations I need to brush up my drm/scheduler annotation patches, those should hit all the important cases. Wrt documenting the overall locking hierarchy, best place is probably: https://cgit.freedesktop.org/drm/drm/tree/drivers/dma-buf/dma-resv.c#n96 Meaning there's no explicit (kerneldoc) documentation about the nesting, but we at least try to teach lockdep about them directly. Cheers, Daniel > Andrey > > > > > > With those patches merged lockdep starts to complain about the problems > > as soon as you try to run a kernel with that enabled. > > > > Regards, > > Christian. > > > >> > >> Andrey > >> > >> > >>> > >>> Christian. > >>> > >>> Am 09.03.21 um 17:40 schrieb Andrey Grodzovsky: > >>>> Because he takes the write side lock post amdgpu_pre_asic_reset - > >>>> where HW suspend sequence happens (touching registers) - so i think > >>>> it will assert. > >>>> > >>>> Andrey > >>>> > >>>> On 2021-03-09 7:56 a.m., Christian König wrote: > >>>>> Hi Dennis, > >>>>> > >>>>> why do you think that this will always assert in reset thread? > >>>>> > >>>>> In the reset thread while we are holding the reset lock write side > >>>>> lockdep_assert_held() should be satisfied and not cause any splat > >>>>> in the system log. > >>>>> > >>>>> Regards, > >>>>> Christian. > >>>>> > >>>>> Am 09.03.21 um 03:03 schrieb Li, Dennis: > >>>>>> [AMD Official Use Only - Internal Distribution Only] > >>>>>> > >>>>>> Hi, Christian, > >>>>>> amdgpu_device_skip_hw_access will always assert in reset > >>>>>> thread, which seems not a good idea. > >>>>>> > >>>>>> Best Regards > >>>>>> Dennis Li > >>>>>> -----Original Message----- > >>>>>> From: Christian König <ckoenig.leichtzumerken@xxxxxxxxx> > >>>>>> Sent: Tuesday, March 9, 2021 2:07 AM > >>>>>> To: amd-gfx@xxxxxxxxxxxxxxxxxxxxx > >>>>>> Cc: Grodzovsky, Andrey <Andrey.Grodzovsky@xxxxxxx>; Li, Dennis > >>>>>> <Dennis.Li@xxxxxxx> > >>>>>> Subject: [PATCH] drm/amdgpu: capture invalid hardware access v2 > >>>>>> > >>>>>> From: Dennis Li <Dennis.Li@xxxxxxx> > >>>>>> > >>>>>> When recovery thread has begun GPU reset, there should be not > >>>>>> other threads to access hardware, otherwise system randomly hang. > >>>>>> > >>>>>> v2 (chk): rewritten from scratch, use trylock and lockdep instead of > >>>>>> hand wiring the logic. > >>>>>> > >>>>>> Signed-off-by: Dennis Li <Dennis.Li@xxxxxxx> > >>>>>> Signed-off-by: Christian König <christian.koenig@xxxxxxx> > >>>>>> --- > >>>>>> drivers/gpu/drm/amd/amdgpu/amdgpu_device.c | 74 > >>>>>> +++++++++++++++++----- > >>>>>> 1 file changed, 57 insertions(+), 17 deletions(-) > >>>>>> > >>>>>> diff --git a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>> b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>> index e247c3a2ec08..c990af6a43ca 100644 > >>>>>> --- a/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>> +++ b/drivers/gpu/drm/amd/amdgpu/amdgpu_device.c > >>>>>> @@ -326,6 +326,34 @@ void amdgpu_device_vram_access(struct > >>>>>> amdgpu_device *adev, loff_t pos, > >>>>>> /* > >>>>>> * register access helper functions. > >>>>>> */ > >>>>>> + > >>>>>> +/* Check if hw access should be skipped because of hotplug or device > >>>>>> +error */ static bool amdgpu_device_skip_hw_access(struct > >>>>>> amdgpu_device > >>>>>> +*adev) { > >>>>>> +if (adev->in_pci_err_recovery) > >>>>>> +return true; > >>>>>> + > >>>>>> +#ifdef CONFIG_LOCKDEP > >>>>>> +/* > >>>>>> + * This is a bit complicated to understand, so worth a comment. > >>>>>> What we assert > >>>>>> + * here is that the GPU reset is not running on another thread in > >>>>>> parallel. > >>>>>> + * > >>>>>> + * For this we trylock the read side of the reset semaphore, if > >>>>>> that succeeds > >>>>>> + * we know that the reset is not running in paralell. > >>>>>> + * > >>>>>> + * If the trylock fails we assert that we are either already > >>>>>> holding the read > >>>>>> + * side of the lock or are the reset thread itself and hold the > >>>>>> write side of > >>>>>> + * the lock. > >>>>>> + */ > >>>>>> +if (down_read_trylock(&adev->reset_sem)) > >>>>>> +up_read(&adev->reset_sem); > >>>>>> +else > >>>>>> +lockdep_assert_held(&adev->reset_sem); > >>>>>> +#endif > >>>>>> + > >>>>>> +return false; > >>>>>> +} > >>>>>> + > >>>>>> /** > >>>>>> * amdgpu_device_rreg - read a memory mapped IO or indirect > >>>>>> register > >>>>>> * > >>>>>> @@ -340,7 +368,7 @@ uint32_t amdgpu_device_rreg(struct > >>>>>> amdgpu_device *adev, { > >>>>>> uint32_t ret; > >>>>>> > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return 0; > >>>>>> > >>>>>> if ((reg * 4) < adev->rmmio_size) { > >>>>>> @@ -377,7 +405,7 @@ uint32_t amdgpu_device_rreg(struct > >>>>>> amdgpu_device *adev, > >>>>>> */ > >>>>>> uint8_t amdgpu_mm_rreg8(struct amdgpu_device *adev, uint32_t > >>>>>> offset) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return 0; > >>>>>> > >>>>>> if (offset < adev->rmmio_size) > >>>>>> @@ -402,7 +430,7 @@ uint8_t amdgpu_mm_rreg8(struct amdgpu_device > >>>>>> *adev, uint32_t offset) > >>>>>> */ > >>>>>> void amdgpu_mm_wreg8(struct amdgpu_device *adev, uint32_t > >>>>>> offset, uint8_t value) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return; > >>>>>> > >>>>>> if (offset < adev->rmmio_size) > >>>>>> @@ -425,7 +453,7 @@ void amdgpu_device_wreg(struct amdgpu_device > >>>>>> *adev, > >>>>>> uint32_t reg, uint32_t v, > >>>>>> uint32_t acc_flags) > >>>>>> { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return; > >>>>>> > >>>>>> if ((reg * 4) < adev->rmmio_size) { > >>>>>> @@ -452,7 +480,7 @@ void amdgpu_device_wreg(struct amdgpu_device > >>>>>> *adev, void amdgpu_mm_wreg_mmio_rlc(struct amdgpu_device *adev, > >>>>>> uint32_t reg, uint32_t v) > >>>>>> { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return; > >>>>>> > >>>>>> if (amdgpu_sriov_fullaccess(adev) && > >>>>>> @@ -475,7 +503,7 @@ void amdgpu_mm_wreg_mmio_rlc(struct > >>>>>> amdgpu_device *adev, > >>>>>> */ > >>>>>> u32 amdgpu_io_rreg(struct amdgpu_device *adev, u32 reg) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return 0; > >>>>>> > >>>>>> if ((reg * 4) < adev->rio_mem_size) > >>>>>> @@ -497,7 +525,7 @@ u32 amdgpu_io_rreg(struct amdgpu_device *adev, > >>>>>> u32 reg) > >>>>>> */ > >>>>>> void amdgpu_io_wreg(struct amdgpu_device *adev, u32 reg, u32 v) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return; > >>>>>> > >>>>>> if ((reg * 4) < adev->rio_mem_size) > >>>>>> @@ -519,7 +547,7 @@ void amdgpu_io_wreg(struct amdgpu_device > >>>>>> *adev, u32 reg, u32 v) > >>>>>> */ > >>>>>> u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return 0; > >>>>>> > >>>>>> if (index < adev->doorbell.num_doorbells) { @@ -542,7 +570,7 @@ > >>>>>> u32 amdgpu_mm_rdoorbell(struct amdgpu_device *adev, u32 index) > >>>>>> */ > >>>>>> void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, > >>>>>> u32 v) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return; > >>>>>> > >>>>>> if (index < adev->doorbell.num_doorbells) { @@ -563,7 +591,7 @@ > >>>>>> void amdgpu_mm_wdoorbell(struct amdgpu_device *adev, u32 index, > >>>>>> u32 v) > >>>>>> */ > >>>>>> u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return 0; > >>>>>> > >>>>>> if (index < adev->doorbell.num_doorbells) { @@ -586,7 +614,7 @@ > >>>>>> u64 amdgpu_mm_rdoorbell64(struct amdgpu_device *adev, u32 index) > >>>>>> */ > >>>>>> void amdgpu_mm_wdoorbell64(struct amdgpu_device *adev, u32 > >>>>>> index, u64 v) { > >>>>>> -if (adev->in_pci_err_recovery) > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> return; > >>>>>> > >>>>>> if (index < adev->doorbell.num_doorbells) { @@ -610,10 +638,13 > >>>>>> @@ u32 amdgpu_device_indirect_rreg(struct amdgpu_device *adev, > >>>>>> u32 pcie_index, u32 pcie_data, > >>>>>> u32 reg_addr) > >>>>>> { > >>>>>> -unsigned long flags; > >>>>>> -u32 r; > >>>>>> void __iomem *pcie_index_offset; > >>>>>> void __iomem *pcie_data_offset; > >>>>>> +unsigned long flags; > >>>>>> +u32 r; > >>>>>> + > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> +return 0; > >>>>>> > >>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags); > >>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * > >>>>>> 4; @@ -641,10 +672,13 @@ u64 amdgpu_device_indirect_rreg64(struct > >>>>>> amdgpu_device *adev, > >>>>>> u32 pcie_index, u32 pcie_data, > >>>>>> u32 reg_addr) > >>>>>> { > >>>>>> -unsigned long flags; > >>>>>> -u64 r; > >>>>>> void __iomem *pcie_index_offset; > >>>>>> void __iomem *pcie_data_offset; > >>>>>> +unsigned long flags; > >>>>>> +u64 r; > >>>>>> + > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> +return 0; > >>>>>> > >>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags); > >>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * > >>>>>> 4; @@ -677,9 +711,12 @@ void amdgpu_device_indirect_wreg(struct > >>>>>> amdgpu_device *adev, > >>>>>> u32 pcie_index, u32 pcie_data, > >>>>>> u32 reg_addr, u32 reg_data) > >>>>>> { > >>>>>> -unsigned long flags; > >>>>>> void __iomem *pcie_index_offset; > >>>>>> void __iomem *pcie_data_offset; > >>>>>> +unsigned long flags; > >>>>>> + > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> +return; > >>>>>> > >>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags); > >>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * > >>>>>> 4; @@ -706,9 +743,12 @@ void amdgpu_device_indirect_wreg64(struct > >>>>>> amdgpu_device *adev, > >>>>>> u32 pcie_index, u32 pcie_data, > >>>>>> u32 reg_addr, u64 reg_data) > >>>>>> { > >>>>>> -unsigned long flags; > >>>>>> void __iomem *pcie_index_offset; > >>>>>> void __iomem *pcie_data_offset; > >>>>>> +unsigned long flags; > >>>>>> + > >>>>>> +if (amdgpu_device_skip_hw_access(adev)) > >>>>>> +return; > >>>>>> > >>>>>> spin_lock_irqsave(&adev->pcie_idx_lock, flags); > >>>>>> pcie_index_offset = (void __iomem *)adev->rmmio + pcie_index * 4; > >>>>>> -- > >>>>>> 2.25.1 > >>>>>> > >>>>> > >>> > > -- Daniel Vetter Software Engineer, Intel Corporation http://blog.ffwll.ch _______________________________________________ amd-gfx mailing list amd-gfx@xxxxxxxxxxxxxxxxxxxxx https://lists.freedesktop.org/mailman/listinfo/amd-gfx