On Wed, Jun 22, 2016 at 8:21 AM, Marek Olšák <maraeo at gmail.com> wrote: > I don't think so. > > The VM faults can only occur when accessing the linear texture, and > the Mesa code should use the correct workarounds already. > > The tiled texture is just a collection of 1D tiles (8x8 pixels) and > SDMA operates on those 1D tiles. It doesn't access memory outside of > 1D tile boundaries it's supposed to access. 2D tiling is just a > different ordering of 1D tiles with greater alignment requirements. > The 2D tile parameters such as bank_height and macro_tile_aspect only > affect that ordering. 1D tiles are always the same regardless of the > higher tile mode. Given that, I don't see how SDMA can behave > differently here. > > There are 2 possible explanations for VM faults from tiled access: > - The tile parameters passed to SDMA don't agree with the parameters > determined by addrlib. (or there can be a bug in passing those between > processes) > - Unknown or undiscovered SDMA bug. > > Note that no docs describe the VM fault bug from linear access. > > If you both have Carrizo, you should get the same 2D tile parameters. > If you don't, it's weird. The row size varies based on the memory configuration and the number of banks populated. It might be worth adjusting the row size in gfx_v8_0_gpu_early_init() to see if that helps reproduce the issue. Alex > > Marek > > On Wed, Jun 22, 2016 at 9:50 AM, Nicolai Hähnle <nhaehnle at gmail.com> wrote: >> Hi Mads, >> >> setting R600_DEBUG=nodma in the X server should work around your problem for >> now. >> >> Marek, perhaps an out-of-bounds check for tiled texture memory access >> similar to the linear access check is necessary? I wonder if you've seen >> something about that in the docs. >> >> I've annotated the sDMA IB dump. It's a linear-to-display-tiled copy on >> Carrizo. I tried to reproduce with the attached patch, but failed to do so >> even with amdgpu.vm_debug=1. With the patch, I get DMA copies that are >> identical to the one that causes the VM fault except for a different >> bank_height and macro_tile_aspect, so the issue is likely related to those. >> >> Nicolai >> >> On 21.06.2016 19:32, Nicolai Hähnle wrote: >>> >>> On 21.06.2016 19:16, Mads wrote: >>>> >>>> I sent this for 1.5 hours ago, but since it hasn't arrived to the >>>> mailing list yet, I try again... >>> >>> >>> It arrived, no worries :) >>> >>> I'll take a look later. >>> >>> Nicolai >>> >>>> >>>> On 2016-06-21 17:48, Mads wrote: >>>> >>>>> On 2016-06-21 10:12, Mads wrote: >>>>> >>>>> On 2016-06-21 09:39, Nicolai Hähnle wrote: >>>>> >>>>> Thanks. However, I still don't think this is going to help. Your >>>>> earlier trace experiments showed that the problematic SDMA commands >>>>> came from the X server, _not_ from plasmashell. >>>>> >>>>> So what we see here is likely just the first set of GPU commands sent >>>>> by plasmashell after the VM fault occurred. Since the plasmashell >>>>> process is unable to tell who caused the VM fault, it takes the blame >>>>> incorrectly. Are you sure the X server is using your self-compiled >>>>> radeonsi_dri.so and has the environment variable set? If it creates a >>>>> ddebug_dump, it might be somewhere else (it's based off the HOME >>>>> environment variable, which may be different). >>>>> I'll take a second look to see if there's an X dump there too, but >>>>> unfortunately it'll be in about ~8 hours before I have the machine at >>>>> hand again.. >>>>> >>>>> And yes, I'm sure, everything is built through portage, so there is no >>>>> "self-compiled" on the system per se. There's always just one lib >>>>> available at any time :) >>>> >>>> >>>> You were right! X didn't have R600_DEBUG=check_vm in environment (no >>>> login shell/sourcing of /etc/profile). >>>> >>>> Here's what i ran: >>>> >>>>> $ XAUTHORITY=.Xauthority DISPLAY=:0 LIBGL_DEBUG=verbose dolphin >>>>> libGL: pci id for fd 9: 1002:9874, driver radeonsi >>>>> libGL: OpenDriver: trying /usr/lib64/dri/tls/radeonsi_dri.so >>>>> libGL: OpenDriver: trying /usr/lib64/dri/radeonsi_dri.so >>>>> si_vm_fault_occured: failed to parse line ' Either >>>>> enable ECC checking or force module loading by setting >>>>> 'ecc_enable_override'. >>>>> ' >>>>> libGL: Using DRI3 for screen 0 >>>>> Trying to convert empty KLocalizedString to QString. >>>>> Cannot creat accessible child interface for object: >>>>> PlacesView(0x118d670) index: 5 >>>>> QPixmap::scaled: Pixmap is a null pixmap >>>>> QPixmap::scaled: Pixmap is a null pixmap >>>>> (... etc ...) >>>>> The X11 connection broke (error 1). Did the X11 server die? >>>> >>>> >>>> Attaching dmesg and ddebug_dump. >>>> >>>> - Mads >> >> > _______________________________________________ > amd-gfx mailing list > amd-gfx at lists.freedesktop.org > https://lists.freedesktop.org/mailman/listinfo/amd-gfx