SDMA out-of-bounds write access of tiled surface

deathsimple@xxxxxxxxxxx (Christian König) · Wed, 22 Jun 2016 09:53:28 +0200

Hi Nocolai,

If we don't already have an option for this try to double the size of 
the VM area allocate for each BO in userspace.

That should give you a nice hole between each BO and so should help to 
catch cases when somebody writes over the end of a BO.

Regards,
Christian.

Am 22.06.2016 um 09:50 schrieb Nicolai HÃ¤hnle:
> Hi Mads,
>
> setting R600_DEBUG=nodma in the X server should work around your 
> problem for now.
>
> Marek, perhaps an out-of-bounds check for tiled texture memory access 
> similar to the linear access check is necessary? I wonder if you've 
> seen something about that in the docs.
>
> I've annotated the sDMA IB dump. It's a linear-to-display-tiled copy 
> on Carrizo. I tried to reproduce with the attached patch, but failed 
> to do so even with amdgpu.vm_debug=1. With the patch, I get DMA copies 
> that are identical to the one that causes the VM fault except for a 
> different bank_height and macro_tile_aspect, so the issue is likely 
> related to those.
>
> Nicolai
>
> On 21.06.2016 19:32, Nicolai HÃ¤hnle wrote:
>> On 21.06.2016 19:16, Mads wrote:
>>> I sent this for 1.5 hours ago, but since it hasn't arrived to the
>>> mailing list yet, I try again...
>>
>> It arrived, no worries :)
>>
>> I'll take a look later.
>>
>> Nicolai
>>
>>>
>>> On 2016-06-21 17:48, Mads wrote:
>>>
>>>> On 2016-06-21 10:12, Mads wrote:
>>>>
>>>> On 2016-06-21 09:39, Nicolai HÃ¤hnle wrote:
>>>>
>>>> Thanks. However, I still don't think this is going to help. Your
>>>> earlier trace experiments showed that the problematic SDMA commands
>>>> came from the X server, _not_ from plasmashell.
>>>>
>>>> So what we see here is likely just the first set of GPU commands sent
>>>> by plasmashell after the VM fault occurred. Since the plasmashell
>>>> process is unable to tell who caused the VM fault, it takes the blame
>>>> incorrectly. Are you sure the X server is using your self-compiled
>>>> radeonsi_dri.so and has the environment variable set? If it creates a
>>>> ddebug_dump, it might be somewhere else (it's based off the HOME
>>>> environment variable, which may be different).
>>>> I'll take a second look to see if there's an X dump there too, but
>>>> unfortunately it'll be in about ~8 hours before I have the machine at
>>>> hand again..
>>>>
>>>> And yes, I'm sure, everything is built through portage, so there is no
>>>> "self-compiled" on the system per se. There's always just one lib
>>>> available at any time :)
>>>
>>> You were right! X didn't have R600_DEBUG=check_vm in environment (no
>>> login shell/sourcing of /etc/profile).
>>>
>>> Here's what i ran:
>>>
>>>> $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin
>>>> libGL: pci id for fd 9: 1002:9874, driver radeonsi
>>>> libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so
>>>> libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so
>>>> si_vm_fault_occured: failed to parse line ' Either
>>>> enable ECC checking or force module loading by setting
>>>> 'ecc_enable_override'.
>>>> '
>>>> libGL: Using DRI3 for screen 0
>>>> Trying to convert empty KLocalizedString to QString.
>>>> Cannot creat accessible child interface for object:
>>>> PlacesView(0x118d670)  index:  5
>>>> QPixmap::scaled: Pixmap is a null pixmap
>>>> QPixmap::scaled: Pixmap is a null pixmap
>>>> (... etc ...)
>>>> The X11 connection broke (error 1). Did the X11 server die?
>>>
>>> Attaching dmesg and ddebug_dump.
>>>
>>> - Mads
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160622/90fc9c42/attachment-0001.html>