On Thu, Apr 05, 2007 at 05:00:44AM +0200, Nick Piggin wrote: > On Thu, Apr 05, 2007 at 12:43:50PM +1000, David Chinner wrote: > > On Thu, Apr 05, 2007 at 10:10:18AM +1000, David Chinner wrote: > > > On Mon, Apr 02, 2007 at 02:09:34PM +0200, Nick Piggin wrote: > > > > Updated aops patchset against 2.6.21-rc5. > > > > > > > > http://www.kernel.org/pub/linux/kernel/people/npiggin/patches/new-aops/ > > > > > > > > Files/dirs are 2.6.21-rc5-new-aops* > > > > > > > > Contains numerous fixes from Mark and myself -- I'd say the core code is > > > > getting reasonably stable at this point. > > > ...... > > > > > > > (compile only) patches for NFS, XFS, FUSE, eCryptfs. OK, they're untested, > > > > > > Failed to compile in UDF and reiser for me, but no doubt you know > > > that already. Don't have time to look at why - I just disabled them > > > so I get some QA done on the XFS and core changes. > > > > Nick - the XFSQA randholes test locks up. As you can probably guess > > it creates holey files and does stuff to them ;) > > > > Both pdflush and a write doing an a blocked waiting for a page to become unlocked. > > They are both waiting on teh same page (0xa0007fffffe8bd68). > > [...] > > > So the page is locked and it has a delalloc buffer on it that used to > > be a hole. The problem page is at EOF.... > > > > No idea what is going write yet..... > > OK, thanks for testing. This is with the full patchset applied, I guess. Yes. > No real ideas here (brute force we put a last-locked stack trace into > struct page :P), but I wonder if it may be any of my earlier patches? > (ie. not the new aops path or XFS conversion itself). It's not. I've had a look at the files created by the test - it's the third file in the test that is causing the problem: _do_test 1 50 "-l 5000000 -c 50 -b $pgsize" _do_test 2 100 "-l 10000000 -c 100 -b $pgsize" _do_test 3 100 "-l 10000000 -c 100 -b 512" # test partial pages i.e. the partial page test. This is the effective command that locks it up: # randholes -v -l 10000000 -c 100 -b 512 holes.out Ah, now i see what the I/o pattern is that is triggering this: budgie:/mnt/test # ~/dgc/xfstests/src/randholes -v -v -v -l 10000000 -c 100 -b 512 dgc.holes.out randholes: Seed = 1175749165 (use "-s 1175749165" to re-execute this test) randholes: blocksize=512, filesize=268435456, seed=1175749165 randholes: count=100, offset=0, extsize=0 randholes: verbose=3, wsync=0, direct=0, rt=0, alloconly=0, preserve=0, test=0 write .writing data at offset=acbd800, value 0xacbd800 and 0xacbd800 writing data at offset=5c09600, value 0x5c09600 and 0x5c09600 writing data at offset=97bde00, value 0x97bde00 and 0x97bde00 And we writing to offset: 0xa0000001004611f0 xfs_write+0xc10 args (0xe00000347895e750, 0xe00000347aa1fd40, 0xe00000347aa1fd30, 0x200, 0xe00000347aa1fdc0) ..... [0]kdb> md8c1 0xe00000347aa1fdc0 0xe00000347aa1fdc0 000000000c885400 .T...... Which is well past teh last EOF. So when we extend the file past the EOF, the old EOF had a partial block (and page) that needed to have the remaining chunk zeroed. This will have found the page already in the cache, but it is locked. That tends to imply that it wasn't unlocked correctly on teh first call to xfs_iozero it was originally written... Ah, found it. page_cache_write_begin() returns zero on success. So this: status = pagecache_write_begin(NULL, mapping, pos, bytes, AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata); if (!status) break; Will abort on success, never zero the range of the page it's supposed to and leave without calling write_end. Then when we extend the file again.... Hmmm - an interesting question just came up. The return from ->write_end() will either be equal to 'copied' or zero. What do you do if it returns zero? it indicates a failure of some kind, but what? In xfs_iozero, because we loop based on the number of bytes we get returned from pagecache_write_end(); a return of zero bytes will result in trying the same operation again. If it fails repeatedly in the same manner (likely), then we've got an endless loop. I can break out of the loop, but I've got no idea what the error was or how to handle it. Any thoughts on this? Also, when pagecache_write_begin() (and ->write_begin()) returns zero, it indicates success. When pagecache_write_end() (and ->write_end()) returns zero, it indicates some kind of failure occurred. Can you make the return values more consistent, Nick? Patch below Cheers, Dave. -- Dave Chinner Principal Engineer SGI Australian Software Group --- fs/xfs/linux-2.6/xfs_lrw.c | 11 +++++------ 1 file changed, 5 insertions(+), 6 deletions(-) Index: 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_lrw.c =================================================================== --- 2.6.x-xfs-new.orig/fs/xfs/linux-2.6/xfs_lrw.c 2007-04-05 15:07:54.000000000 +1000 +++ 2.6.x-xfs-new/fs/xfs/linux-2.6/xfs_lrw.c 2007-04-05 15:41:44.912532457 +1000 @@ -151,18 +151,17 @@ xfs_iozero( status = pagecache_write_begin(NULL, mapping, pos, bytes, AOP_FLAG_UNINTERRUPTIBLE, &page, &fsdata); - if (!status) + if (status) break; memclear_highpage_flush(page, offset, bytes); status = pagecache_write_end(NULL, mapping, pos, bytes, bytes, page, fsdata); - if (status < 0) - break; - bytes = status; - pos += bytes; - count -= bytes; + WARN_ON(status <= 0); /* can't return less than zero! */ + pos += status; + count -= status; + status = 0; } while (count); return (-status); - To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html