SDMA out-of-bounds write access of tiled surface

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Nocolai,

If we don't already have an option for this try to double the size of 
the VM area allocate for each BO in userspace.

That should give you a nice hole between each BO and so should help to 
catch cases when somebody writes over the end of a BO.

Regards,
Christian.

Am 22.06.2016 um 09:50 schrieb Nicolai Hähnle:
> Hi Mads,
>
> setting R600_DEBUG=nodma in the X server should work around your 
> problem for now.
>
> Marek, perhaps an out-of-bounds check for tiled texture memory access 
> similar to the linear access check is necessary? I wonder if you've 
> seen something about that in the docs.
>
> I've annotated the sDMA IB dump. It's a linear-to-display-tiled copy 
> on Carrizo. I tried to reproduce with the attached patch, but failed 
> to do so even with amdgpu.vm_debug=1. With the patch, I get DMA copies 
> that are identical to the one that causes the VM fault except for a 
> different bank_height and macro_tile_aspect, so the issue is likely 
> related to those.
>
> Nicolai
>
> On 21.06.2016 19:32, Nicolai Hähnle wrote:
>> On 21.06.2016 19:16, Mads wrote:
>>> I sent this for 1.5 hours ago, but since it hasn't arrived to the
>>> mailing list yet, I try again...
>>
>> It arrived, no worries :)
>>
>> I'll take a look later.
>>
>> Nicolai
>>
>>>
>>> On 2016-06-21 17:48, Mads wrote:
>>>
>>>> On 2016-06-21 10:12, Mads wrote:
>>>>
>>>> On 2016-06-21 09:39, Nicolai Hähnle wrote:
>>>>
>>>> Thanks. However, I still don't think this is going to help. Your
>>>> earlier trace experiments showed that the problematic SDMA commands
>>>> came from the X server, _not_ from plasmashell.
>>>>
>>>> So what we see here is likely just the first set of GPU commands sent
>>>> by plasmashell after the VM fault occurred. Since the plasmashell
>>>> process is unable to tell who caused the VM fault, it takes the blame
>>>> incorrectly. Are you sure the X server is using your self-compiled
>>>> radeonsi_dri.so and has the environment variable set? If it creates a
>>>> ddebug_dump, it might be somewhere else (it's based off the HOME
>>>> environment variable, which may be different).
>>>> I'll take a second look to see if there's an X dump there too, but
>>>> unfortunately it'll be in about ~8 hours before I have the machine at
>>>> hand again..
>>>>
>>>> And yes, I'm sure, everything is built through portage, so there is no
>>>> "self-compiled" on the system per se. There's always just one lib
>>>> available at any time :)
>>>
>>> You were right! X didn't have R600_DEBUG=check_vm in environment (no
>>> login shell/sourcing of /etc/profile).
>>>
>>> Here's what i ran:
>>>
>>>> $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin
>>>> libGL: pci id for fd 9: 1002:9874, driver radeonsi
>>>> libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so
>>>> libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so
>>>> si_vm_fault_occured: failed to parse line ' Either
>>>> enable ECC checking or force module loading by setting
>>>> 'ecc_enable_override'.
>>>> '
>>>> libGL: Using DRI3 for screen 0
>>>> Trying to convert empty KLocalizedString to QString.
>>>> Cannot creat accessible child interface for object:
>>>> PlacesView(0x118d670)  index:  5
>>>> QPixmap::scaled: Pixmap is a null pixmap
>>>> QPixmap::scaled: Pixmap is a null pixmap
>>>> (... etc ...)
>>>> The X11 connection broke (error 1). Did the X11 server die?
>>>
>>> Attaching dmesg and ddebug_dump.
>>>
>>> - Mads
>
>
>
> _______________________________________________
> amd-gfx mailing list
> amd-gfx at lists.freedesktop.org
> https://lists.freedesktop.org/mailman/listinfo/amd-gfx

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <https://lists.freedesktop.org/archives/amd-gfx/attachments/20160622/90fc9c42/attachment-0001.html>


[Index of Archives]     [Linux USB Devel]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux