On 22.06.2016 09:53, Christian König wrote: > Hi Nocolai, > > If we don't already have an option for this try to double the size of > the VM area allocate for each BO in userspace. > > That should give you a nice hole between each BO and so should help to > catch cases when somebody writes over the end of a BO. Tried that (+ forcing the buffer cache to re-use BOs only with the exact size), but no change in observed behavior. Cheers, Nicolai > > Regards, > Christian. > > Am 22.06.2016 um 09:50 schrieb Nicolai Hähnle: >> Hi Mads, >> >> setting R600_DEBUG=nodma in the X server should work around your >> problem for now. >> >> Marek, perhaps an out-of-bounds check for tiled texture memory access >> similar to the linear access check is necessary? I wonder if you've >> seen something about that in the docs. >> >> I've annotated the sDMA IB dump. It's a linear-to-display-tiled copy >> on Carrizo. I tried to reproduce with the attached patch, but failed >> to do so even with amdgpu.vm_debug=1. With the patch, I get DMA copies >> that are identical to the one that causes the VM fault except for a >> different bank_height and macro_tile_aspect, so the issue is likely >> related to those. >> >> Nicolai >> >> On 21.06.2016 19:32, Nicolai Hähnle wrote: >>> On 21.06.2016 19:16, Mads wrote: >>>> I sent this for 1.5 hours ago, but since it hasn't arrived to the >>>> mailing list yet, I try again... >>> >>> It arrived, no worries :) >>> >>> I'll take a look later. >>> >>> Nicolai >>> >>>> >>>> On 2016-06-21 17:48, Mads wrote: >>>> >>>>> On 2016-06-21 10:12, Mads wrote: >>>>> >>>>> On 2016-06-21 09:39, Nicolai Hähnle wrote: >>>>> >>>>> Thanks. However, I still don't think this is going to help. Your >>>>> earlier trace experiments showed that the problematic SDMA commands >>>>> came from the X server, _not_ from plasmashell. >>>>> >>>>> So what we see here is likely just the first set of GPU commands sent >>>>> by plasmashell after the VM fault occurred. Since the plasmashell >>>>> process is unable to tell who caused the VM fault, it takes the blame >>>>> incorrectly. Are you sure the X server is using your self-compiled >>>>> radeonsi_dri.so and has the environment variable set? If it creates a >>>>> ddebug_dump, it might be somewhere else (it's based off the HOME >>>>> environment variable, which may be different). >>>>> I'll take a second look to see if there's an X dump there too, but >>>>> unfortunately it'll be in about ~8 hours before I have the machine at >>>>> hand again.. >>>>> >>>>> And yes, I'm sure, everything is built through portage, so there is no >>>>> "self-compiled" on the system per se. There's always just one lib >>>>> available at any time :) >>>> >>>> You were right! X didn't have R600_DEBUG=check_vm in environment (no >>>> login shell/sourcing of /etc/profile). >>>> >>>> Here's what i ran: >>>> >>>>> $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin >>>>> libGL: pci id for fd 9: 1002:9874, driver radeonsi >>>>> libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so >>>>> libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so >>>>> si_vm_fault_occured: failed to parse line ' Either >>>>> enable ECC checking or force module loading by setting >>>>> 'ecc_enable_override'. >>>>> ' >>>>> libGL: Using DRI3 for screen 0 >>>>> Trying to convert empty KLocalizedString to QString. >>>>> Cannot creat accessible child interface for object: >>>>> PlacesView(0x118d670) index: 5 >>>>> QPixmap::scaled: Pixmap is a null pixmap >>>>> QPixmap::scaled: Pixmap is a null pixmap >>>>> (... etc ...) >>>>> The X11 connection broke (error 1). Did the X11 server die? >>>> >>>> Attaching dmesg and ddebug_dump. >>>> >>>> - Mads >> >> >> >> _______________________________________________ >> amd-gfx mailing list >> amd-gfx at lists.freedesktop.org >> https://lists.freedesktop.org/mailman/listinfo/amd-gfx >