Re: deleting 2TB lots of files with delaylog: sync helps?

Dave Chinner <david@xxxxxxxxxxxxx> · Wed, 1 Sep 2010 10:06:31 +1000

On Wed, Sep 01, 2010 at 01:30:41AM +0200, Michael Monnerie wrote:
> I'm just trying the delaylog mount option on a filesystem (LVM over 
> 2x 2TB 4K sector drives), and I see this while running 8 processes 
> of "rm -r * & 2>/dev/null":
> 
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> sdc               2,80    33,40  125,00   64,60   720,00   939,30    17,50     0,55    2,91   1,71  32,40
> sdd               0,00    25,60  122,80   63,40   662,40   874,40    16,51     0,52    2,77   1,96  36,54
> dm-0              0,00     0,00  250,60  123,00  1382,40  1941,70    17,79     1,64    4,39   1,74  65,08
> 
> Then I issue "sync", and utilisation increases:
> Device:         rrqm/s   wrqm/s     r/s     w/s    rkB/s    wkB/s avgrq-sz avgqu-sz   await  svctm  %util
> sdc               0,00     0,20   15,80  175,40    84,00  2093,30    22,78     0,62    3,26   2,93  55,94
> sdd               0,00     1,00   13,40  177,60    79,20  2114,10    22,97     0,69    3,63   3,34  63,80
> dm-0              0,00     0,00   29,20  101,20   163,20  4207,40    67,03     1,11    8,51   7,56  98,60
> 
> This is reproducible.

You're probably getting RMW cycles on inode writeback. I've been
noticing this lately with my benchmarking - the VM is being _very
aggressive_ reclaiming page cache pages vs inode caches and as a
result the inode buffers used for IO are being reclaimed between the
time it takes to create the inodes and when they are written back.
Hence you get lots of reads occurring during inode writeback.

By issuing a sync, you clear out all the inode writeback and all the
RMW cycles go away. As a result, there is more disk throughput
availble for the unlink processes.  There is a good chance this is
the case as the number of reads after the sync drop by an order of
magnitude...

> Now it can be that the sync just causes more writes and stalls reads
> so overall it's slower, but I'm wondering why none of the devices says "100% util", which
> should be the case on deletes? Or is this again the "mistake" of the utilization calculation
> that writes do not really show up there?

You're probably CPU bound, not IO bound.

> I know I should have benchmarked and tested, I just wanted to raise eyes on this as it 
> could be possible there's something to optimize.
> 
> Another strange thing: After the 8 "rm -r" finished, there were some subdirs left over 
> that hadn't been deleted - running one "rm -r" cleaned them out then. Could that be
> a problem with "delaylog"?

Unlikely - files not being deleted is not a function of the way
transactions are written to disk. It's a function of whether the
operation was performed or not.

> Or can that happen when several "rm" compete in the same dirs?

Most likely.

Cheers,

Dave.
-- 
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs