> Date: Sun, 3 Nov 2013 19:43:48 +0000 > From: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > On Sun, Nov 03, 2013 at 01:22:52PM +0100, Mark Kettenis wrote: > > I ran into a "regression" in xf86-video-intel master. X would spin > > for several seconds and eventually I'd see a message like: > > > > [ 170.724] kgem_bo_write: failed to write 3600 bytes into BO handle=175: 14 > > > > in Xorg.0.log > > > > Bisected it down to the following commit: > > > > > > commit 4f41bf3de059c4e0a03fb161fb2e78d94be69e3f > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Date: Tue Oct 29 09:56:10 2013 +0000 > > > > sna: Try harder to complete writes > > > > Expunge our caches if we fail to write into a bo (presuming that > > allocation failure is the likely fixable cause). > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > > > It's obviously trying really really hard to write into that bo now ;). > > But failing eventually anyway. So I started digging deeper. The > > pwrite is failing with EFAULT. Some kernel debugging revealed that > > the fault happens when copying data from user space. Adding some > > debugging printf's shows that > > > > pwrite.data_ptr = 0x1cc19d9831f0 > > pwrite.size = 3648 > > > > and that the fault happens at address 0x1cc19d984000. This is the > > same 3600 byte buffer from the message. Obviously pwrite is reading > > beyond the end of the buffer, running into the next page, which isn't > > there. > > > > Now I'm seeing this on OpenBSD. I'm guessing this is actually a > > malloc()'ed buffer. And on OpenBSD malloc() is extremely nasty. It > > tries to align the allocated space such that the end lies right at the > > end of a page and inserts a guard page. It does this to catch buffer > > overruns like this. You're much less likely to hit something like > > this on Linux, unless you're using a special debug malloc library. > > > > This problem was introduced with the following commit: > > > > > > commit 95f4da647a4055545b09cae0834df0fa2127a458 > > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > Date: Wed Nov 30 11:59:31 2011 +0000 > > > > sna: Align pwrite to transfer whole cachelines > > > > Daniel claims that this is will be faster, or will be once he has > > completed rewriting pwrite! > > > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > > > > I fear that optimization simply isn't safe. > > It was. All the local allocations are page-aligned, the issue are the > couple of places where we upload from other buffers. > > diff --git a/src/sna/kgem.c b/src/sna/kgem.c > index 264a379..1217367 100644 > --- a/src/sna/kgem.c > +++ b/src/sna/kgem.c > @@ -482,7 +482,7 @@ bool kgem_bo_write(struct kgem *kgem, struct kgem_bo *bo, > > assert(length <= bytes(bo)); > retry: > - if (gem_write(kgem->fd, bo->handle, 0, length, data)) { > + if (__gem_write(kgem->fd, bo->handle, 0, length, data)) { > int err = errno; > > assert(err != EINVAL); Fixed it the same way here, and things seem stable enough. So I guess that's a: Tested-by: Mark Kettenis <kettenis@xxxxxxxxxxx> Reviewed-by: Mark Kettenis <kettenis@xxxxxxxxxxx> Thanks, Mark _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx