On Sun, Nov 03, 2013 at 01:22:52PM +0100, Mark Kettenis wrote: > I ran into a "regression" in xf86-video-intel master. X would spin > for several seconds and eventually I'd see a message like: > > [ 170.724] kgem_bo_write: failed to write 3600 bytes into BO handle=175: 14 > > in Xorg.0.log > > Bisected it down to the following commit: > > > commit 4f41bf3de059c4e0a03fb161fb2e78d94be69e3f > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Date: Tue Oct 29 09:56:10 2013 +0000 > > sna: Try harder to complete writes > > Expunge our caches if we fail to write into a bo (presuming that > allocation failure is the likely fixable cause). > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > It's obviously trying really really hard to write into that bo now ;). > But failing eventually anyway. So I started digging deeper. The > pwrite is failing with EFAULT. Some kernel debugging revealed that > the fault happens when copying data from user space. Adding some > debugging printf's shows that > > pwrite.data_ptr = 0x1cc19d9831f0 > pwrite.size = 3648 > > and that the fault happens at address 0x1cc19d984000. This is the > same 3600 byte buffer from the message. Obviously pwrite is reading > beyond the end of the buffer, running into the next page, which isn't > there. > > Now I'm seeing this on OpenBSD. I'm guessing this is actually a > malloc()'ed buffer. And on OpenBSD malloc() is extremely nasty. It > tries to align the allocated space such that the end lies right at the > end of a page and inserts a guard page. It does this to catch buffer > overruns like this. You're much less likely to hit something like > this on Linux, unless you're using a special debug malloc library. > > This problem was introduced with the following commit: > > > commit 95f4da647a4055545b09cae0834df0fa2127a458 > Author: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > Date: Wed Nov 30 11:59:31 2011 +0000 > > sna: Align pwrite to transfer whole cachelines > > Daniel claims that this is will be faster, or will be once he has > completed rewriting pwrite! > > Signed-off-by: Chris Wilson <chris@xxxxxxxxxxxxxxxxxx> > > > I fear that optimization simply isn't safe. It was. All the local allocations are page-aligned, the issue are the couple of places where we upload from other buffers. diff --git a/src/sna/kgem.c b/src/sna/kgem.c index 264a379..1217367 100644 --- a/src/sna/kgem.c +++ b/src/sna/kgem.c @@ -482,7 +482,7 @@ bool kgem_bo_write(struct kgem *kgem, struct kgem_bo *bo, assert(length <= bytes(bo)); retry: - if (gem_write(kgem->fd, bo->handle, 0, length, data)) { + if (__gem_write(kgem->fd, bo->handle, 0, length, data)) { int err = errno; assert(err != EINVAL); -Chris -- Chris Wilson, Intel Open Source Technology Centre _______________________________________________ Intel-gfx mailing list Intel-gfx@xxxxxxxxxxxxxxxxxxxxx http://lists.freedesktop.org/mailman/listinfo/intel-gfx