Hi Folks, We have a user report about skip quota check on first mount/boot several monthes ago, the original discussion thread can be found at: http://oss.sgi.com/archives/xfs/2013-06/msg00170.html. As per Dave's suggestion, it would be possible to perform quota check in parallel, this patch series is just trying to follow up that idea. Sorry for the too long day as I have to spent most of time dealing with personl things in the last few monthes, I was afraid I can not quickly follow up the review procedure. Now the nightmare is over, it's time to revive this task. Also, my previous test results on my laptop and a poor desktop can not convience me that performs parallism quota check can really get benefits compare to the current single thread as both machines are shipped with slow disks, I even observed a little performance regression with millions of small files(e.g, 100 bytes) as quota check is IO bound, additionaly, it could affected by the seek time differences. Now with a Mackbook Air I bought recently, it can show significant difference. tests: - create files via fs_mark (empty file/100 byte small file) fs_mark -k -S 0 -n 100000 -D 100 -N 1000 -d /xfs -t [10|20|30|50] -s [0|100] - mount -ouquota,pquota /dev/sdaX /storage - run each test for 5 times and figure out the average value test environment: - laptop: i5-3320M CPU 4 cores, 8G ram, normal SATA disk results of empty files via time: - # of file(million) default patched 1 real 1m12.0661s real 1m8.328s user 0m0.000s user 0m0.000s sys 0m43.692s sys 0m0.048s 2 real 1m43.907s real 1m16.221s user 0m0.004s user 0m0.000s sys 1m32.968s sys 0m0.065s 3 real 2m36.632s real 1m48.011s user 0m0.000s user 0m0.002s sys 2m23.501s sys 0m0.094s 5 real 4m20.266s real 3m0.145s user 0m0.000s user 0m0.002s sys 3m56.264s sys 0m0.092s results of 100 bytes files via time: - # of file(million) default patched 1 real 1m34.492 real 1m51.268s user 0m0.008s user 0m0.008.s sys 0m54.432s sys 0m0.236s 3 real 3m26.687s real 3m16.152s user 0m0.000s user 0m0.000s sys 2m23.144s sys 0m0.088s So with emtpy files, the performance still looks good but with small files, this change introduced a little regression on very slow storage. I guess this is caused by disk seek as data blocks allocated and spreads over the disk. In order to get some more reasonable results, I ask a friend helping run this test on a server which were shown as following. test environment - 16core, 25G ram, normal SATA disk, but the XFS is resides on a loop dev. result of 100 bytes files via time: - # of file(million) default patched 1 real 0m19.015s real 0m16.238s user 0m0.004s user 0m0.002s sys 0m4.358s sys 0m0.030s 2 real 0m34.106s real 0m28.300s user 0m0.012s user 0m0.002s sys 0m8.820s sys 0m0.035s 3 real 0m53.716s real 46.390s user 0m0.002s user 0m0.005s sys 0m13.396s sys 0m0.023s 5 real 2m26.361s real 2m17.415s user 0m0.004s user 0m0.004s sys 0m22.188s sys 0m0.023s In this case, there is no regression although there is no noticeable improvements. :( test environment - Macbook Air i7-4650U with SSD, 8G ram - # of file(million) default patched 1 real 0m6.367s real 0m1.972s user 0m0.008s user 0m0.000s sys 0m2.614s sys 0m0.008s 2 real 0m3.772s real 0m15.221s user 0m0.000s user 0m0.000s sys 0m0.007s sys 0m6.269s 5 real 0m36.036s real 0m8.902s user 0m0.000s user 0m0.002s sys 0m14.025s sys 0m0.006s Btw, The current implementation has a defeat considering the duplicated code at [patch 0/4] xfs: implement parallism quota check at mount time. Maybe it's better to introduce a new function xfs_bulkstat_ag() which can be used to bulkstat inodes per ag, hence it could shared at above patch while adjusting dquota usage per ag, i.e, xfs_qm_dqusage_adjust_perag(). As usual, critism and comments are both welcome! Thanks, -Jeff _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs