On Wed, Nov 24, 2010 at 11:50:03AM +1100, Nick Piggin wrote: > On Wed, Nov 24, 2010 at 07:58:04AM +1100, Dave Chinner wrote: > > On Tue, Nov 23, 2010 at 11:24:49PM +1100, Nick Piggin wrote: > > > Hi, > > > > > > Running parallel fs_mark (0 size inodes, fsync on close) on a ramdisk > > > ends up with XFS in funny patterns. > > > > > > procs -----------memory---------- ---swap-- -----io---- -system-- > > > ----cpu---- > > > r b swpd free buff cache si so bi bo in cs us sy > > > id wa > > > 24 1 6576 166396 252 393676 132 140 16900 80666 21308 104333 1 84 14 1 > > > 21 0 6712 433856 256 387080 100 224 9152 53487 13677 53732 0 55 45 0 > > > 2 0 7068 463496 248 389100 0 364 2940 17896 4485 26122 0 33 65 2 > > > 1 0 7068 464340 248 388928 0 0 0 0 66 207 0 0 100 0 > > > 0 0 7068 464340 248 388928 0 0 0 0 79 200 0 0 100 0 > > > 0 0 7068 464544 248 388928 0 0 0 0 65 199 0 0 100 0 > > > 1 0 7068 464748 248 388928 0 0 0 0 79 201 0 0 100 0 > > > 0 0 7068 465064 248 388928 0 0 0 0 66 202 0 0 100 0 > > > 0 0 7068 465312 248 388928 0 0 0 0 80 200 0 0 100 0 > > > 0 0 7068 465500 248 388928 0 0 0 0 65 199 0 0 100 0 > > > 0 0 7068 465500 248 388928 0 0 0 0 80 202 0 0 100 0 > > > 1 0 7068 465500 248 388928 0 0 0 0 66 203 0 0 100 0 > > > 0 0 7068 465500 248 388928 0 0 0 0 79 200 0 0 100 0 > > > 23 0 7068 460332 248 388800 0 0 1416 8896 1981 7142 0 1 99 0 > > > 6 0 6968 360248 248 403736 56 0 15568 95171 19438 110825 1 79 21 0 > > > 23 0 6904 248736 248 419704 392 0 17412 118270 20208 111396 1 82 17 0 > > > 9 0 6884 266116 248 435904 128 0 14956 79756 18554 118020 1 76 23 0 > > > 0 0 6848 219640 248 445760 212 0 9932 51572 12622 76491 0 60 40 0 > > > > > > Got a dump of sleeping tasks. Any ideas? > > > > It is stuck waiting for log space to be freed up. Generally this is > > caused by log IO completion not occurring or an unflushable object > > preventing the tail from being moved forward. What: > > Yeah it's strage, it seems like it hits some timeout or gets kicked > along by background writeback or something. Missed wakeup somewhere? No idea yet. > > - is the output of mkfs.xfs? > > meta-data=/dev/ram0 isize=256 agcount=16, agsize=65536 blks > = sectsz=512 attr=2 > data = bsize=4096 blocks=1048576, imaxpct=25 > = sunit=0 swidth=0 blks > naming =version 2 bsize=4096 ascii-ci=0 > log =internal log bsize=4096 blocks=16384, version=2 > = sectsz=512 sunit=0 blks, lazy-count=1 > realtime =none extsz=4096 blocks=0, rtextents=0 Ok, small log, small AGs. > > - are your mount options? > > mount -o delaylog,logbsize=262144,nobarrier /dev/ram0 mnt > > > - is the fs_mark command line? > > ../fs_mark -S1 -k -n 1000 -L 100 -s 0 -d scratch/0 -d scratch/1 -d > scratch/2 -d scratch/3 -d scratch/4 -d scratch/5 -d scratch/6 -d > scratch/7 -d scratch/8 -d scratch/9 -d scratch/10 -d scratch/11 -d > scratch/12 -d scratch/13 -d scratch/14 -d scratch/15 -d scratch/16 -d > scratch/17 -d scratch/18 -d scratch/19 -d scratch/20 -d scratch/21 -d > scratch/22 -d scratch/23 > for f in scratch/* ; do rm -rf $f & done ; wait Ok, so you are effectively doing a concurrent synchronous create of 2.4M zero byte files. BTW, how many CPU cores does your machine have? if it's more than 8, then you're probably getting a fair bit of serialisation on the per-ag structures. I normally use agcount=num_cpus * 2 for scalability testing when running one load thread per CPU. > Ran it again, and yes it has locked up for a long long time, it seems > to be in the rm phase, but I think I've seen similar stall (although not > so long) in the fs_mark phase too. Ok, I've just reproduced a couple of short hangs (a few seconds) during the rm phase so I should be able to track it down. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs