On Wed, Jan 03, 2018 at 05:27:48PM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@xxxxxxxxxx> > > When running a "create millions inodes in a directory" test > recently, I noticed we were spending a huge amount of time > converting freespace block headers from disk format to in-memory > format: > > 31.47% [kernel] [k] xfs_dir2_node_addname > 17.86% [kernel] [k] xfs_dir3_free_hdr_from_disk > 3.55% [kernel] [k] xfs_dir3_free_bests_p > > We shouldn't be hitting the best free block scanning code so hard > when doing sequential directory creates, and it turns out there's > a highly suboptimal loop searching the the best free array in > the freespace block - it decodes the block header before checking > each entry inside a loop, instead of decoding the header once before > running the entry search loop. > > This makes a massive difference to create rates. Profile now looks > like this: > > 13.15% [kernel] [k] xfs_dir2_node_addname > 3.52% [kernel] [k] xfs_dir3_leaf_check_int > 3.11% [kernel] [k] xfs_log_commit_cil > > And the wall time/average file create rate differences are > just as stark: > > create time(sec) / rate (files/s) > File count vanilla patched > 10k 0.54 / 18.5k 0.53 / 18.9k > 20k 1.10 / 18.1k 1.05 / 19.0k > 100k 4.21 / 23.8k 3.91 / 25.6k > 200k 9.66 / 20,7k 7.37 / 27.1k > 1M 86.61 / 11.5k 48.26 / 20.7k > 2M 206.13 / 9.7k 129.71 / 15.4k > > The larger the directory, the bigger the performance improvement. > Interesting.. > Signed-Off-By: Dave Chinner <dchinner@xxxxxxxxxx> > --- > fs/xfs/libxfs/xfs_dir2_node.c | 30 +++++++++++++++--------------- > 1 file changed, 15 insertions(+), 15 deletions(-) > > diff --git a/fs/xfs/libxfs/xfs_dir2_node.c b/fs/xfs/libxfs/xfs_dir2_node.c > index 682e2bf370c7..bcf0d43cd6a8 100644 > --- a/fs/xfs/libxfs/xfs_dir2_node.c > +++ b/fs/xfs/libxfs/xfs_dir2_node.c > @@ -1829,24 +1829,24 @@ xfs_dir2_node_addname_int( > */ > bests = dp->d_ops->free_bests_p(free); > dp->d_ops->free_hdr_from_disk(&freehdr, free); > - if (be16_to_cpu(bests[findex]) != NULLDATAOFF && > - be16_to_cpu(bests[findex]) >= length) > - dbno = freehdr.firstdb + findex; > - else { > - /* > - * Are we done with the freeblock? > - */ > - if (++findex == freehdr.nvalid) { > - /* > - * Drop the block. > - */ > - xfs_trans_brelse(tp, fbp); > - fbp = NULL; > - if (fblk && fblk->bp) > - fblk->bp = NULL; Ok, so we're adding a dir entry to a node dir and walking the free space blocks to see if we have space somewhere to insert the entry without growing the dir. The current code reads the free block, converts the header, checks bests[findex], then bumps findex or invalidates the free block if we're done with it. The updated code reads the free block, converts the header, iterates the free index range then invalidates the block when complete (assuming we don't find suitable free space). The end result is that we don't convert the block header over and over for each index in the individual block. Seems reasonable to me, just a couple nits... > + do { > + Extra space above. > + if (be16_to_cpu(bests[findex]) != NULLDATAOFF && > + be16_to_cpu(bests[findex]) >= length) { > + dbno = freehdr.firstdb + findex; > + break; > } > + } while (++findex < freehdr.nvalid); > + > + /* Drop the block if we done with the freeblock */ "... if we're done ..." Also FWIW, according to the comment it looks like the only reason the freehdr conversion is elevated to this scope is to accommodate gcc foolishness. If so, I'm wondering if a simple NULL init of bests at the top of the function would avoid that problem and allow us to move the code to where it was apparently intended to be in the first place. Hm? Brian > + if (findex == freehdr.nvalid) { > + xfs_trans_brelse(tp, fbp); > + fbp = NULL; > + if (fblk) > + fblk->bp = NULL; > } > } > + > /* > * If we don't have a data block, we need to allocate one and make > * the freespace entries refer to it. > -- > 2.15.0 > > -- > To unsubscribe from this list: send the line "unsubscribe linux-xfs" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-xfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html