Re: [ANNOUNCE] new new aops patchset

David Chinner <dgc@xxxxxxx> · Thu, 5 Apr 2007 16:18:09 +1000

On Thu, Apr 05, 2007 at 05:00:44AM +0200, Nick Piggin wrote:
> On Thu, Apr 05, 2007 at 12:43:50PM +1000, David Chinner wrote:
> > On Thu, Apr 05, 2007 at 10:10:18AM +1000, David Chinner wrote:
> > > On Mon, Apr 02, 2007 at 02:09:34PM +0200, Nick Piggin wrote:
> > > > Updated aops patchset against 2.6.21-rc5.
> > > > 
> > > > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/
> > > > 
> > > > Files/dirs are 2.6.21-rc5-new-aops*
> > > > 
> > > > Contains numerous fixes from Mark and myself -- I'd say the core code is
> > > > getting reasonably stable at this point.
> > > ......
> > > 
> > > > (compile only) patches for NFS, XFS, FUSE, eCryptfs. OK, they're untested,
> > > 
> > > Failed to compile in UDF and reiser for me, but no doubt you know
> > > that already. Don't have time to look at why - I just disabled them
> > > so I get some QA done on the XFS and core changes.
> > 
> > Nick - the XFSQA randholes test locks up. As you can probably guess
> > it creates holey files and does stuff to them ;)
> > 
> > Both pdflush and a write doing an a blocked waiting for a page to become unlocked. 
> > They are both waiting on teh same page (0xa0007fffffe8bd68).
> 
> [...]
> 
> > So the page is locked and it has a delalloc buffer on it that used to
> > be a hole. The problem page is at EOF....
> > 
> > No idea what is going write yet.....
> 
> OK, thanks for testing. This is with the full patchset applied, I guess.

Yes.

> No real ideas here (brute force we put a last-locked stack trace into
> struct page :P), but I wonder if it may be any of my earlier patches?
> (ie. not the new aops path or XFS conversion itself).

It's not.

I've had a look at the files created by the test - it's the third
file in the test that is causing the problem:

_do_test 1 50 "-l 5000000 -c 50 -b $pgsize"
_do_test 2 100 "-l 10000000 -c 100 -b $pgsize"
_do_test 3 100 "-l 10000000 -c 100 -b 512"   # test partial pages

i.e. the partial page test. This is the effective command that
locks it up:

# randholes -v -l 10000000 -c 100 -b 512 holes.out

Ah, now i see what the I/o pattern is that is triggering this:

budgie:/mnt/test # ~/dgc/xfstests/src/randholes -v -v -v  -l 10000000 -c 100 -b 512 dgc.holes.out
randholes: Seed = 1175749165 (use "-s 1175749165" to re-execute this test)
randholes: blocksize=512, filesize=268435456, seed=1175749165
randholes: count=100, offset=0, extsize=0
randholes: verbose=3, wsync=0, direct=0, rt=0, alloconly=0, preserve=0, test=0
write
.writing data at offset=acbd800, value 0xacbd800 and 0xacbd800
writing data at offset=5c09600, value 0x5c09600 and 0x5c09600
writing data at offset=97bde00, value 0x97bde00 and 0x97bde00


And we writing to offset:

0xa0000001004611f0 xfs_write+0xc10
        args (0xe00000347895e750, 0xe00000347aa1fd40, 0xe00000347aa1fd30, 0x200, 0xe00000347aa1fdc0)
.....
[0]kdb> md8c1 0xe00000347aa1fdc0
0xe00000347aa1fdc0 000000000c885400                    .T......

Which is well past teh last EOF.
So when we extend the file past the EOF, the old EOF had a partial
block (and page) that needed to have the remaining chunk zeroed. This
will have found the page already in the cache, but it is locked.

That tends to imply that it wasn't unlocked correctly on teh first call to
xfs_iozero it was originally written...

Ah, found it. page_cache_write_begin() returns zero on success. So this:

               status = pagecache_write_begin(NULL, mapping, pos, bytes,
                                       AOP_FLAG_UNINTERRUPTIBLE,
                                       &page, &fsdata);
               if (!status)
                        break;

Will abort on success, never zero the range of the page it's supposed
to and leave without calling write_end. Then when we extend the file again....

Hmmm - an interesting question just came up. The return from ->write_end()
will either be equal to 'copied' or zero. What do you do if it returns
zero? it indicates a failure of some kind, but what?

In xfs_iozero, because we loop based on the number of bytes we get
returned from pagecache_write_end(); a return of zero bytes will
result in trying the same operation again. If it fails repeatedly in
the same manner (likely), then we've got an endless loop. I can
break out of the loop, but I've got no idea what the error was or
how to handle it. Any thoughts on this?

Also, when pagecache_write_begin() (and ->write_begin()) returns
zero, it indicates success. When pagecache_write_end() (and ->write_end())
returns zero, it indicates some kind of failure occurred. Can
you make the return values more consistent, Nick?

Patch below

Cheers,

Dave.
-- 
Dave Chinner
Principal Engineer
SGI Australian Software Group

---
 fs/xfs/linux-2.6/xfs_lrw.c |   11 +++++------
 1 file changed, 5 insertions(+), 6 deletions(-)

Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_lrw.c
===================================================================

--- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_lrw.c	2007-04-05 15:07:54.000000000 +1000
+++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_lrw.c	2007-04-05 15:41:44.912532457 +1000
@@ -151,18 +151,17 @@ xfs_iozero(
 		status = pagecache_write_begin(NULL, mapping, pos, bytes,
 					AOP_FLAG_UNINTERRUPTIBLE,
 					&page, &fsdata);
-		if (!status)
+		if (status)
 			break;
 
 		memclear_highpage_flush(page, offset, bytes);
 
 		status = pagecache_write_end(NULL, mapping, pos, bytes, bytes,
 					page, fsdata);
-		if (status < 0)
-			break;
-		bytes = status;
-		pos += bytes;
-		count -= bytes;
+		WARN_ON(status <= 0); /* can't return less than zero! */
+		pos += status;
+		count -= status;
+		status = 0;
 	} while (count);
 
 	return (-status);
-
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html