On 2022-12-15 06:27, Christian König wrote: > Am 15.12.22 um 11:19 schrieb Luben Tuikov: >> On 2022-12-15 04:46, Christian König wrote: >>> Am 15.12.22 um 10:08 schrieb Luben Tuikov: >>>> On 2022-12-15 03:07, Christian König wrote: >>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy: >>>>>> On 2022-12-14 22:02, Alex Deucher wrote: >>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@xxxxxxx> >>>>>>> wrote: >>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote: >>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips. >>>>>>>>> >>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an >>>>>>>>> AGP chip, >>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is >>>>>>>>> 0x7FFFFFF, and >>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF, >>>>>>>>> false. As such the result of this static inline isn't suitable for >>>>>>>>> the last >>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to >>>>>>>>> use GFP_DMA32 >>>>>>>>> when allocating DMA buffers. >>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it >>>>>>>> clearly can't have anything to with the actual DMA address size. Not to >>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so >>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the >>>>>>>> reported symptoms initially sounded like they could be caused by DMA >>>>>>>> going to the wrong place, that is also equally consistent with a >>>>>>>> loss of >>>>>>>> cache coherency. >>>>>>>> >>>>>>>> My (limited) understanding of AGP is that the GART can effectively >>>>>>>> alias >>>>>>>> memory to a second physical address, so I could well believe that >>>>>>>> something somewhere in the driver stack needs to perform some cache >>>>>>>> maintenance to avoid coherency issues, and that in these particular >>>>>>>> setups whatever that is might be assuming the memory is direct-mapped >>>>>>>> and thus going wrong for highmem pages. >>>>>>>> >>>>>>>> So as I said before, I really think this is not about using >>>>>>>> GFP_DMA32 at >>>>>>>> all, but about *not* using GFP_HIGHUSER. >>>>>>> One of the wonderful features of AGP is that it has to be used with >>>>>>> uncached memory. The aperture basically just provides a remapping of >>>>>>> physical pages into a linear aperture that you point the GPU at. TTM >>>>>>> has to jump through quite a few hoops to get uncached memory in the >>>>>>> first place, so it's likely that that somehow isn't compatible with >>>>>>> HIGHMEM. Can you get uncached HIGHMEM? >>>>>> I guess in principle yes, if you're careful not to use regular >>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for >>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for >>>>>> slipping up. >>>>> I theory we should do exactly that in TTM, but we have very few users >>>>> who actually still exercise that functionality. >>>>> >>>>>> Working backwards from primitives like set_memory_uc(), I see various >>>>>> paths in TTM where manipulating the caching state is skipped for >>>>>> highmem pages, but I wouldn't even know where to start looking for >>>>>> whether the right state is propagated to all the places where they >>>>>> might eventually be mapped somewhere. >>>>> The tt object has the caching state for the pages and >>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the >>>>> userspace/vmalloc mappings. >>>>> >>>> The point of this patch is that dma_addressing_limited() is unsuitable as >>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this >>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption. >>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc()) >>> Well I would rather say that dma_addressing_limited() works, but the >>> default value from dma_get_required_mask() is broken. >>> >> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF. > > This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB > addressable memory (27 bits set)? Or is there another F missing? Yeah, I'm missing an F--it is correctly described at the top of the thread above, i.e. in the commit of v2 patch. 0x7FFF_FFFF, which seems correct, no? >> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init(). >> >>> 32 bits only work with bounce buffers and we can't use those on graphics >>> hardware. >>> >>>> Is there an objection to this patch, if it fixes the screen corruption? >>> Not from my side, but fixing the underlying issues would be better I think. >>> >> Have they been identified? > > I'm not 100% sure. I think by using GFP_DMA32 we just work around the > issue somehow. Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM code trying to understand what we do when GFP_DMA32 is not set, and the immediate thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct. (Then I got down to the caching attributes...) It's be nice if we can find the actual issue--what else would it show us that needs fixing...? So what do we do with this patch? Shouldn't leave it in a limbo--some OSes ship their kernel with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly reverted. Regards, Luben