On Wed, Apr 23, 2014 at 04:02:16PM -0400, Brian Foster wrote: > On Wed, Apr 23, 2014 at 04:24:43PM +1000, Dave Chinner wrote: > > On Thu, Apr 10, 2014 at 12:11:07PM -0400, Brian Foster wrote: > > > Add the findfirst_free_inode_rec() and next_free_ino_rec() helpers > > > to assist scanning the in-core inode records for records with at > > > least one free inode. These will be used to determine what records > > > are included in the free inode btree. > > > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > > --- > > > repair/incore.h | 27 +++++++++++++++++++++++++++ > > > 1 file changed, 27 insertions(+) > > > > > > diff --git a/repair/incore.h b/repair/incore.h > > > index 5419884..5f8c188 100644 > > > --- a/repair/incore.h > > > +++ b/repair/incore.h > > > @@ -381,6 +381,33 @@ void clear_uncertain_ino_cache(xfs_agnumber_t agno); > > > ((ino_tree_node_t *) ((ino_node_ptr)->avl_node.avl_forw)) > > > > > > /* > > > + * finobt helpers > > > + */ > > > +static inline ino_tree_node_t * > > > +findfirst_free_inode_rec(xfs_agnumber_t agno) > > > +{ > > > + ino_tree_node_t *ino_rec; > > > + > > > + ino_rec = findfirst_inode_rec(agno); > > > + > > > + while (ino_rec && !ino_rec->ir_free) > > > + ino_rec = next_ino_rec(ino_rec); > > > + > > > + return ino_rec; > > > +} > > > + > > > +static inline ino_tree_node_t * > > > +next_free_ino_rec(ino_tree_node_t *ino_rec) > > > +{ > > > + ino_rec = next_ino_rec(ino_rec); > > > + > > > + while (ino_rec && !ino_rec->ir_free) > > > + ino_rec = next_ino_rec(ino_rec); > > > + > > > + return ino_rec; > > > +} > > > > That looks a bit inefficient - walking the list of inode records to > > find the next one with free inodes. Perhaps we woul dbe better > > served by adding a new list for inode records with free entries and > > walking that instead? > > > > Iknow, it's a memory vs speed tradeoff, but if we've got millions of > > inodes and very few free, the difference could be very significant. > > > > Hmm, perhaps. The ino_tree_node_t structure is 104 bytes according to > pahole. With 64 inodes per node, 1 billion inodes would turn into around > 15 million of these structures. If I write a dumb little program to > allocate, initialize and iterate 15 million ~100 byte structures > (accessing the structure beginning and end), it uses ~1.8GB RAM and runs > in under 5 seconds. If I increase that to 30 million, it allocates > ~3.5GB RAM, triggers kswapd (4GB RAM VM) and seems to run anywhere from > 30s to 1m30s. > > All of that probably doesn't mean that much in the context of repair, I > guess. ;) *nod* Repair does more than this between searches (so TLB pressure will be different), some CPUs have prefetchers that detect pointer chasing like this, some systems have remote (hence slower) memory. Hence testing in isolation doesn't necessarily give an accurate picture... > I'll have to experiment a bit more with this and, at minimum, > I'll include some data in my next rev. FWIW, if it is a problem I'd > probably approach this as an additional optimization patch rather than > rework this one. I also note that we already do multiple scans of the > inode record list (e.g., cursor initialization), so we might be able to > buy back some of whatever the finobt degradation is by condensing the > cursor initialization scans. Yup, doing it as an additional patch if it is necessary sounds like a good plan to me. Cheers, Dave. -- Dave Chinner david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs