On 2022-12-15 04:46, Christian König wrote: > Am 15.12.22 um 10:08 schrieb Luben Tuikov: >> On 2022-12-15 03:07, Christian König wrote: >>> Am 15.12.22 um 00:08 schrieb Robin Murphy: >>>> On 2022-12-14 22:02, Alex Deucher wrote: >>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@xxxxxxx> >>>>> wrote: >>>>>> On 2022-12-12 02:08, Luben Tuikov wrote: >>>>>>> Fix screen corruption on older 32-bit systems using AGP chips. >>>>>>> >>>>>>> On older systems with little memory, for instance 1.5 GiB, using an >>>>>>> AGP chip, >>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is >>>>>>> 0x7FFFFFF, and >>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF, >>>>>>> false. As such the result of this static inline isn't suitable for >>>>>>> the last >>>>>>> argument to ttm_device_init()--it simply needs to now whether to >>>>>>> use GFP_DMA32 >>>>>>> when allocating DMA buffers. >>>>>> This sounds wrong to me. If the issues happen on systems without PAE it >>>>>> clearly can't have anything to with the actual DMA address size. Not to >>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so >>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the >>>>>> reported symptoms initially sounded like they could be caused by DMA >>>>>> going to the wrong place, that is also equally consistent with a >>>>>> loss of >>>>>> cache coherency. >>>>>> >>>>>> My (limited) understanding of AGP is that the GART can effectively >>>>>> alias >>>>>> memory to a second physical address, so I could well believe that >>>>>> something somewhere in the driver stack needs to perform some cache >>>>>> maintenance to avoid coherency issues, and that in these particular >>>>>> setups whatever that is might be assuming the memory is direct-mapped >>>>>> and thus going wrong for highmem pages. >>>>>> >>>>>> So as I said before, I really think this is not about using >>>>>> GFP_DMA32 at >>>>>> all, but about *not* using GFP_HIGHUSER. >>>>> One of the wonderful features of AGP is that it has to be used with >>>>> uncached memory. The aperture basically just provides a remapping of >>>>> physical pages into a linear aperture that you point the GPU at. TTM >>>>> has to jump through quite a few hoops to get uncached memory in the >>>>> first place, so it's likely that that somehow isn't compatible with >>>>> HIGHMEM. Can you get uncached HIGHMEM? >>>> I guess in principle yes, if you're careful not to use regular >>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for >>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for >>>> slipping up. >>> I theory we should do exactly that in TTM, but we have very few users >>> who actually still exercise that functionality. >>> >>>> Working backwards from primitives like set_memory_uc(), I see various >>>> paths in TTM where manipulating the caching state is skipped for >>>> highmem pages, but I wouldn't even know where to start looking for >>>> whether the right state is propagated to all the places where they >>>> might eventually be mapped somewhere. >>> The tt object has the caching state for the pages and >>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the >>> userspace/vmalloc mappings. >>> >> The point of this patch is that dma_addressing_limited() is unsuitable as >> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this >> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption. >> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc()) > > Well I would rather say that dma_addressing_limited() works, but the > default value from dma_get_required_mask() is broken. > dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF. While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init(). > 32 bits only work with bounce buffers and we can't use those on graphics > hardware. > >> Is there an objection to this patch, if it fixes the screen corruption? > > Not from my side, but fixing the underlying issues would be better I think. > Have they been identified? >> Or does TTM need fixing, in that what we really need is to specify whether >> caching is desired and/or DMA32 when we allocate a TTM pool (ttm_pool_init(), >> called from ttm_device_init(), called from radeon_ttm_init.c)? > > Could be, but it's more likely that the problem is in the DMA layer > because we fail to recognize that the device can't access all of the memory. > Right, I agree. Ideally, setting dev->{dma_mask, coherent_dma_mask, bus_dma_limit}, should be sufficient to tell the DMA layer what kind of memory the device can handle. But this patch doesn't change non-local behaviour and as such is local and safe to apply. Regards, Luben