On 2022-12-15 06:53, Robin Murphy wrote: > On 2022-12-15 11:40, Luben Tuikov wrote: >> On 2022-12-15 06:27, Christian König wrote: >>> Am 15.12.22 um 11:19 schrieb Luben Tuikov: >>>> On 2022-12-15 04:46, Christian König wrote: >>>>> Am 15.12.22 um 10:08 schrieb Luben Tuikov: >>>>>> On 2022-12-15 03:07, Christian König wrote: >>>>>>> Am 15.12.22 um 00:08 schrieb Robin Murphy: >>>>>>>> On 2022-12-14 22:02, Alex Deucher wrote: >>>>>>>>> On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@xxxxxxx> >>>>>>>>> wrote: >>>>>>>>>> On 2022-12-12 02:08, Luben Tuikov wrote: >>>>>>>>>>> Fix screen corruption on older 32-bit systems using AGP chips. >>>>>>>>>>> >>>>>>>>>>> On older systems with little memory, for instance 1.5 GiB, using an >>>>>>>>>>> AGP chip, >>>>>>>>>>> the device's DMA mask is 0xFFFFFFFF, but the memory mask is >>>>>>>>>>> 0x7FFFFFF, and >>>>>>>>>>> subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF, >>>>>>>>>>> false. As such the result of this static inline isn't suitable for >>>>>>>>>>> the last >>>>>>>>>>> argument to ttm_device_init()--it simply needs to now whether to >>>>>>>>>>> use GFP_DMA32 >>>>>>>>>>> when allocating DMA buffers. >>>>>>>>>> This sounds wrong to me. If the issues happen on systems without PAE it >>>>>>>>>> clearly can't have anything to with the actual DMA address size. Not to >>>>>>>>>> mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so >>>>>>>>>> GFP_DMA32 would be functionally meaningless anyway. Although the >>>>>>>>>> reported symptoms initially sounded like they could be caused by DMA >>>>>>>>>> going to the wrong place, that is also equally consistent with a >>>>>>>>>> loss of >>>>>>>>>> cache coherency. >>>>>>>>>> >>>>>>>>>> My (limited) understanding of AGP is that the GART can effectively >>>>>>>>>> alias >>>>>>>>>> memory to a second physical address, so I could well believe that >>>>>>>>>> something somewhere in the driver stack needs to perform some cache >>>>>>>>>> maintenance to avoid coherency issues, and that in these particular >>>>>>>>>> setups whatever that is might be assuming the memory is direct-mapped >>>>>>>>>> and thus going wrong for highmem pages. >>>>>>>>>> >>>>>>>>>> So as I said before, I really think this is not about using >>>>>>>>>> GFP_DMA32 at >>>>>>>>>> all, but about *not* using GFP_HIGHUSER. >>>>>>>>> One of the wonderful features of AGP is that it has to be used with >>>>>>>>> uncached memory. The aperture basically just provides a remapping of >>>>>>>>> physical pages into a linear aperture that you point the GPU at. TTM >>>>>>>>> has to jump through quite a few hoops to get uncached memory in the >>>>>>>>> first place, so it's likely that that somehow isn't compatible with >>>>>>>>> HIGHMEM. Can you get uncached HIGHMEM? >>>>>>>> I guess in principle yes, if you're careful not to use regular >>>>>>>> kmap()/kmap_atomic(), and always use pgprot_noncached() for >>>>>>>> userspace/vmalloc mappings, but clearly that leaves lots of scope for >>>>>>>> slipping up. >>>>>>> I theory we should do exactly that in TTM, but we have very few users >>>>>>> who actually still exercise that functionality. >>>>>>> >>>>>>>> Working backwards from primitives like set_memory_uc(), I see various >>>>>>>> paths in TTM where manipulating the caching state is skipped for >>>>>>>> highmem pages, but I wouldn't even know where to start looking for >>>>>>>> whether the right state is propagated to all the places where they >>>>>>>> might eventually be mapped somewhere. >>>>>>> The tt object has the caching state for the pages and >>>>>>> ttm_prot_from_caching() then uses pgprot_noncached() and co for the >>>>>>> userspace/vmalloc mappings. >>>>>>> >>>>>> The point of this patch is that dma_addressing_limited() is unsuitable as >>>>>> the last parameter to ttm_pool_init(), since if it is "false"--as it is in this >>>>>> particular case--then TTM ends up using HIGHUSER, and we get the screen corruption. >>>>>> (gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc()) >>>>> Well I would rather say that dma_addressing_limited() works, but the >>>>> default value from dma_get_required_mask() is broken. >>>>> >>>> dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF. >>> >>> This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB >>> addressable memory (27 bits set)? Or is there another F missing? >> >> Yeah, I'm missing an F--it is correctly described at the top of the thread above, >> i.e. in the commit of v2 patch. >> >> 0x7FFF_FFFF, which seems correct, no? >> >>>> While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init(). >>>> >>>>> 32 bits only work with bounce buffers and we can't use those on graphics >>>>> hardware. >>>>> >>>>>> Is there an objection to this patch, if it fixes the screen corruption? >>>>> Not from my side, but fixing the underlying issues would be better I think. >>>>> >>>> Have they been identified? >>> >>> I'm not 100% sure. I think by using GFP_DMA32 we just work around the >>> issue somehow. >> >> Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM >> code trying to understand what we do when GFP_DMA32 is not set, and the immediate >> thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct. >> (Then I got down to the caching attributes...) >> >> It's be nice if we can find the actual issue--what else would it show us that needs fixing...? >> >> So what do we do with this patch? >> >> Shouldn't leave it in a limbo--some OSes ship their kernel >> with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly >> reverted. > > Removing dma_addressing_limited() is still wrong, for the reasons given > in that commit. What we need is an *additional* condition that > encapsulates "also pass use_dma32 for AGP devices because it avoids some > weird coherency issue with 32-bit highmem that isn't worth trying to > debug further". Yes, you had a patch earlier which did exactly that--why not push that patch? Q: If host memory is 1.5 GiB, i.e. mask of 0x7FFF_FFFF, but the device's mask is 0xFFFF_FFFF, shouldn't we use GFP_DMA32, instead of GFP_HIGHUSER? Regards, Luben