Re: [PATCH] drm/radeon: Fix screen corruption (v2)

Luben Tuikov <luben.tuikov@xxxxxxx> · Thu, 15 Dec 2022 05:19:43 -0500

On 2022-12-15 04:46, Christian König wrote:
> Am 15.12.22 um 10:08 schrieb Luben Tuikov:
>> On 2022-12-15 03:07, Christian König wrote:
>>> Am 15.12.22 um 00:08 schrieb Robin Murphy:
>>>> On 2022-12-14 22:02, Alex Deucher wrote:
>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@xxxxxxx>
>>>>> wrote:
>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote:
>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips.
>>>>>>>
>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an
>>>>>>> AGP chip,
>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is
>>>>>>> 0x7FFFFFF, and
>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
>>>>>>> false. As such the result of this static inline isn't suitable for
>>>>>>> the last
>>>>>>> argument to ttm_device_init()--it simply needs to now whether to
>>>>>>> use GFP_DMA32
>>>>>>> when allocating DMA buffers.
>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it
>>>>>> clearly can't have anything to with the actual DMA address size. Not to
>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the
>>>>>> reported symptoms initially sounded like they could be caused by DMA
>>>>>> going to the wrong place, that is also equally consistent with a
>>>>>> loss of
>>>>>> cache coherency.
>>>>>>
>>>>>> My (limited) understanding of AGP is that the GART can effectively
>>>>>> alias
>>>>>> memory to a second physical address, so I could well believe that
>>>>>> something somewhere in the driver stack needs to perform some cache
>>>>>> maintenance to avoid coherency issues, and that in these particular
>>>>>> setups whatever that is might be assuming the memory is direct-mapped
>>>>>> and thus going wrong for highmem pages.
>>>>>>
>>>>>> So as I said before, I really think this is not about using
>>>>>> GFP_DMA32 at
>>>>>> all, but about *not* using GFP_HIGHUSER.
>>>>> One of the wonderful features of AGP is that it has to be used with
>>>>> uncached memory.  The aperture basically just provides a remapping of
>>>>> physical pages into a linear aperture that you point the GPU at.  TTM
>>>>> has to jump through quite a few hoops to get uncached memory in the
>>>>> first place, so it's likely that that somehow isn't compatible with
>>>>> HIGHMEM.  Can you get uncached HIGHMEM?
>>>> I guess in principle yes, if you're careful not to use regular
>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for
>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for
>>>> slipping up.
>>> I theory we should do exactly that in TTM, but we have very few users
>>> who actually still exercise that functionality.
>>>
>>>> Working backwards from primitives like set_memory_uc(), I see various
>>>> paths in TTM where manipulating the caching state is skipped for
>>>> highmem pages, but I wouldn't even know where to start looking for
>>>> whether the right state is propagated to all the places where they
>>>> might eventually be mapped somewhere.
>>> The tt object has the caching state for the pages and
>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the
>>> userspace/vmalloc mappings.
>>>
>> The point of this patch is that dma_addressing_limited() is unsuitable as
>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
> 
> Well I would rather say that dma_addressing_limited() works, but the 
> default value from dma_get_required_mask() is broken.
> 

dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().

> 32 bits only work with bounce buffers and we can't use those on graphics 
> hardware.
> 
>> Is there an objection to this patch, if it fixes the screen corruption?
> 
> Not from my side, but fixing the underlying issues would be better I think.
> 

Have they been identified?

>> Or does TTM need fixing, in that what we really need is to specify whether
>> caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(),
>> called from ttm_device_init(), called from radeon_ttm_init.c)?
> 
> Could be, but it's more likely that the problem is in the DMA layer 
> because we fail to recognize that the device can't access all of the memory.
> 

Right, I agree. Ideally, setting dev->{dma_mask, coherent_dma_mask, bus_dma_limit},
should be sufficient to tell the DMA layer what kind of memory the device
can handle.

But this patch doesn't change non-local behaviour and as such is local and safe
to apply.

Regards,
Luben