On Wed, Apr 23, 2014 at 04:24:43PM +1000, Dave Chinner wrote: > On Thu, Apr 10, 2014 at 12:11:07PM -0400, Brian Foster wrote: > > Add the findfirst_free_inode_rec() and next_free_ino_rec() helpers > > to assist scanning the in-core inode records for records with at > > least one free inode. These will be used to determine what records > > are included in the free inode btree. > > > > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx> > > --- > > repair/incore.h | 27 +++++++++++++++++++++++++++ > > 1 file changed, 27 insertions(+) > > > > diff --git a/repair/incore.h b/repair/incore.h > > index 5419884..5f8c188 100644 > > --- a/repair/incore.h > > +++ b/repair/incore.h > > @@ -381,6 +381,33 @@ void clear_uncertain_ino_cache(xfs_agnumber_t agno); > > ((ino_tree_node_t *) ((ino_node_ptr)->avl_node.avl_forw)) > > > > /* > > + * finobt helpers > > + */ > > +static inline ino_tree_node_t * > > +findfirst_free_inode_rec(xfs_agnumber_t agno) > > +{ > > + ino_tree_node_t *ino_rec; > > + > > + ino_rec = findfirst_inode_rec(agno); > > + > > + while (ino_rec && !ino_rec->ir_free) > > + ino_rec = next_ino_rec(ino_rec); > > + > > + return ino_rec; > > +} > > + > > +static inline ino_tree_node_t * > > +next_free_ino_rec(ino_tree_node_t *ino_rec) > > +{ > > + ino_rec = next_ino_rec(ino_rec); > > + > > + while (ino_rec && !ino_rec->ir_free) > > + ino_rec = next_ino_rec(ino_rec); > > + > > + return ino_rec; > > +} > > That looks a bit inefficient - walking the list of inode records to > find the next one with free inodes. Perhaps we woul dbe better > served by adding a new list for inode records with free entries and > walking that instead? > > Iknow, it's a memory vs speed tradeoff, but if we've got millions of > inodes and very few free, the difference could be very significant. > Hmm, perhaps. The ino_tree_node_t structure is 104 bytes according to pahole. With 64 inodes per node, 1 billion inodes would turn into around 15 million of these structures. If I write a dumb little program to allocate, initialize and iterate 15 million ~100 byte structures (accessing the structure beginning and end), it uses ~1.8GB RAM and runs in under 5 seconds. If I increase that to 30 million, it allocates ~3.5GB RAM, triggers kswapd (4GB RAM VM) and seems to run anywhere from 30s to 1m30s. All of that probably doesn't mean that much in the context of repair, I guess. ;) I'll have to experiment a bit more with this and, at minimum, I'll include some data in my next rev. FWIW, if it is a problem I'd probably approach this as an additional optimization patch rather than rework this one. I also note that we already do multiple scans of the inode record list (e.g., cursor initialization), so we might be able to buy back some of whatever the finobt degradation is by condensing the cursor initialization scans. Brian > Cheers, > > Dave. > -- > Dave Chinner > david@xxxxxxxxxxxxx _______________________________________________ xfs mailing list xfs@xxxxxxxxxxx http://oss.sgi.com/mailman/listinfo/xfs