On 2022-12-15 11:40, Luben Tuikov wrote:
On 2022-12-15 06:27, Christian König wrote:
Am 15.12.22 um 11:19 schrieb Luben Tuikov:
On 2022-12-15 04:46, Christian König wrote:
Am 15.12.22 um 10:08 schrieb Luben Tuikov:
On 2022-12-15 03:07, Christian König wrote:
Am 15.12.22 um 00:08 schrieb Robin Murphy:
On 2022-12-14 22:02, Alex Deucher wrote:
On Wed, Dec 14, 2022 at 4:54 PM Robin Murphy <robin.murphy@xxxxxxx>
wrote:
On 2022-12-12 02:08, Luben Tuikov wrote:
Fix screen corruption on older 32-bit systems using AGP chips.
On older systems with little memory, for instance 1.5 GiB, using an
AGP chip,
the device's DMA mask is 0xFFFFFFFF, but the memory mask is
0x7FFFFFF, and
subsequently dma_addressing_limited() returns 0xFFFFFFFF < 0x7FFFFFFF,
false. As such the result of this static inline isn't suitable for
the last
argument to ttm_device_init()--it simply needs to now whether to
use GFP_DMA32
when allocating DMA buffers.
This sounds wrong to me. If the issues happen on systems without PAE it
clearly can't have anything to with the actual DMA address size. Not to
mention that AFAICS 32-bit x86 doesn't even have ZONE_DMA32, so
GFP_DMA32 would be functionally meaningless anyway. Although the
reported symptoms initially sounded like they could be caused by DMA
going to the wrong place, that is also equally consistent with a
loss of
cache coherency.
My (limited) understanding of AGP is that the GART can effectively
alias
memory to a second physical address, so I could well believe that
something somewhere in the driver stack needs to perform some cache
maintenance to avoid coherency issues, and that in these particular
setups whatever that is might be assuming the memory is direct-mapped
and thus going wrong for highmem pages.
So as I said before, I really think this is not about using
GFP_DMA32 at
all, but about *not* using GFP_HIGHUSER.
One of the wonderful features of AGP is that it has to be used with
uncached memory. The aperture basically just provides a remapping of
physical pages into a linear aperture that you point the GPU at. TTM
has to jump through quite a few hoops to get uncached memory in the
first place, so it's likely that that somehow isn't compatible with
HIGHMEM. Can you get uncached HIGHMEM?
I guess in principle yes, if you're careful not to use regular
kmap()/kmap_atomic(), and always use pgprot_noncached() for
userspace/vmalloc mappings, but clearly that leaves lots of scope for
slipping up.
I theory we should do exactly that in TTM, but we have very few users
who actually still exercise that functionality.
Working backwards from primitives like set_memory_uc(), I see various
paths in TTM where manipulating the caching state is skipped for
highmem pages, but I wouldn't even know where to start looking for
whether the right state is propagated to all the places where they
might eventually be mapped somewhere.
The tt object has the caching state for the pages and
ttm_prot_from_caching() then uses pgprot_noncached() and co for the
userspace/vmalloc mappings.
The point of this patch is that dma_addressing_limited() is unsuitable as
the last parameter to ttm_pool_init(), since if it is "false"--as it is in this
particular case--then TTM ends up using HIGHUSER, and we get the screen corruption.
(gfp_flags |= GFP_HIGHUSER in in ttm_pool_alloc())
Well I would rather say that dma_addressing_limited() works, but the
default value from dma_get_required_mask() is broken.
dma_get_required_mask() for his setup of 1.5 GiB of memory returns 0x7FFFFFF.
This 0x7FFFFFF mask looks fishy to me. That would only be 128MiB
addressable memory (27 bits set)? Or is there another F missing?
Yeah, I'm missing an F--it is correctly described at the top of the thread above,
i.e. in the commit of v2 patch.
0x7FFF_FFFF, which seems correct, no?
While the dma mask is 0xFFFFFFFF, as set in radeon_device.c in radeon_device_init().
32 bits only work with bounce buffers and we can't use those on graphics
hardware.
Is there an objection to this patch, if it fixes the screen corruption?
Not from my side, but fixing the underlying issues would be better I think.
Have they been identified?
I'm not 100% sure. I think by using GFP_DMA32 we just work around the
issue somehow.
Right. Using GFP_DMA32, we don't touch high-mem. I was looking at the DRM
code trying to understand what we do when GFP_DMA32 is not set, and the immediate
thing I see is that we set GFP_HIGHUSER when use_dma32 is unset in the device struct.
(Then I got down to the caching attributes...)
It's be nice if we can find the actual issue--what else would it show us that needs fixing...?
So what do we do with this patch?
Shouldn't leave it in a limbo--some OSes ship their kernel
with 33b3ad3788ab ("drm/radeon: handle PCIe root ports with addressing limitations") wholly
reverted.
Removing dma_addressing_limited() is still wrong, for the reasons given
in that commit. What we need is an *additional* condition that
encapsulates "also pass use_dma32 for AGP devices because it avoids some
weird coherency issue with 32-bit highmem that isn't worth trying to
debug further".
Thanks,
Robin.