Re: Screen corruption using radeon kernel driver

Luben Tuikov <luben.tuikov@xxxxxxx> · Sun, 11 Dec 2022 00:52:10 -0500

On 2022-12-10 10:32, Mikhail Krylov wrote:
> On Wed, Nov 30, 2022 at 11:07:32AM -0500, Alex Deucher wrote:
>> On Wed, Nov 30, 2022 at 10:42 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
>>>
>>> On 2022-11-30 14:28, Alex Deucher wrote:
>>>> On Wed, Nov 30, 2022 at 7:54 AM Robin Murphy <robin.murphy@xxxxxxx> wrote:
>>>>>
>>>>> On 2022-11-29 17:11, Mikhail Krylov wrote:
>>>>>> On Tue, Nov 29, 2022 at 11:05:28AM -0500, Alex Deucher wrote:
>>>>>>> On Tue, Nov 29, 2022 at 10:59 AM Mikhail Krylov <sqarert@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> On Tue, Nov 29, 2022 at 09:44:19AM -0500, Alex Deucher wrote:
>>>>>>>>> On Mon, Nov 28, 2022 at 3:48 PM Mikhail Krylov <sqarert@xxxxxxxxx> wrote:
>>>>>>>>>>
>>>>>>>>>> On Mon, Nov 28, 2022 at 09:50:50AM -0500, Alex Deucher wrote:
>>>>>>>>>>
>>>>>>>>>>>>> [excessive quoting removed]
>>>>>>>>>>
>>>>>>>>>>>> So, is there any progress on this issue? I do understand it's not a high
>>>>>>>>>>>> priority one, and today I've checked it on 6.0 kernel, and
>>>>>>>>>>>> unfortunately, it still persists...
>>>>>>>>>>>>
>>>>>>>>>>>> I'm considering writing a patch that will allow user to override
>>>>>>>>>>>> need_dma32/dma_bits setting with a module parameter. I'll have some time
>>>>>>>>>>>> after the New Year for that.
>>>>>>>>>>>>
>>>>>>>>>>>> Is it at all possible that such a patch will be merged into kernel?
>>>>>>>>>>>>
>>>>>>>>>>> On Mon, Nov 28, 2022 at 9:31 AM Mikhail Krylov <sqarert@xxxxxxxxx> wrote:
>>>>>>>>>>> Unless someone familiar with HIMEM can figure out what is going wrong
>>>>>>>>>>> we should just revert the patch.
>>>>>>>>>>>
>>>>>>>>>>> Alex
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> Okay, I was suggesting that mostly because
>>>>>>>>>>
>>>>>>>>>> a) it works for me with dma_bits = 40 (I understand that's what it is
>>>>>>>>>> without the original patch applied);
>>>>>>>>>>
>>>>>>>>>> b) there's a hint of uncertainity on this line
>>>>>>>>>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/gpu/drm/radeon/radeon_device.c#n1359
>>>>>>>>>> saying that for AGP dma_bits = 32 is the safest option, so apparently there are
>>>>>>>>>> setups, unlike mine, where dma_bits = 32 is better than 40.
>>>>>>>>>>
>>>>>>>>>> But I'm in no position to argue, just wanted to make myself clear.
>>>>>>>>>> I'm okay with rebuilding the kernel for my machine until the original
>>>>>>>>>> patch is reverted or any other fix is applied.
>>>>>>>>>
>>>>>>>>> What GPU do you have and is it AGP?  If it is AGP, does setting
>>>>>>>>> radeon.agpmode=-1 also fix it?
>>>>>>>>>
>>>>>>>>> Alex
>>>>>>>>
>>>>>>>> That is ATI Radeon X1950, and, unfortunately, radeon.agpmode=-1 doesn't
>>>>>>>> help, it just makes 3D acceleration in games such as OpenArena stop
>>>>>>>> working.
>>>>>>>
>>>>>>> Just to confirm, is the board AGP or PCIe?
>>>>>>>
>>>>>>> Alex
>>>>>>
>>>>>> It is AGP. That's an old machine.
>>>>>
>>>>> Can you check whether dma_addressing_limited() is actually returning the
>>>>> expected result at the point of radeon_ttm_init()? Disabling highmem is
>>>>> presumably just hiding whatever problem exists, by throwing away all
>>>>>   >32-bit RAM such that use_dma32 doesn't matter.
>>>>
>>>> The device in question only supports a 32 bit DMA mask so
>>>> dma_addressing_limited() should return true.  Bounce buffers are not
>>>> really usable on GPUs because they map so much memory.  If
>>>> dma_addressing_limited() returns false, that would explain it.
>>>
>>> Right, it appears to be the only part of the offending commit that
>>> *could* reasonably make any difference, so I'm primarily wondering if
>>> dma_get_required_mask() somehow gets confused.
>>
>> Mikhail,
>>
>> Can you see that dma_addressing_limited() and dma_get_required_mask()
>> return in this case?
>>
>> Alex
>>
>>
>>>
>>> Thanks,
>>> Robin.
> 
> Hello again, I was able to confirm by adding printk() to the functions
> and recompiling the kernel that dma_addressing_limited() returns
> *false* on the kernel with the bug. 
> 
> And dma_get_required_mask() returns 0x7fffffff, as I said before.

Yes, dma_addressing_limited() evaluates to "false" in your case,
and this is the correct answer according to the function's comment:
"Return %true if the devices DMA mask is too small to address all
 memory in the system, else %false."

In this case the device's DMA mask is 0xFFFFFFFF and the mask
for the 1.5 GiB memory is 0x7FFFFFFF, so the static inline
returns "false". (dma_direct_get_required_mask() returns this
for your memory size.)

It would appear that dma_addressing_limited() isn't answering the question
which the last parameter to ttm_device_init(), "use GFP_DMA32", wants
answered. Perhaps we should use another method to make sure that that
parameter is set in the scenario in question.

Regards,
Luben