Re: [PATCH v3 17/20] xfsprogs/repair: helpers for finding in-core inode records w/ free inodes

Brian Foster <bfoster@xxxxxxxxxx> · Wed, 23 Apr 2014 16:02:16 -0400

On Wed, Apr 23, 2014 at 04:24:43PM +1000, Dave Chinner wrote:
> On Thu, Apr 10, 2014 at 12:11:07PM -0400, Brian Foster wrote:
> > Add the findfirst_free_inode_rec() and next_free_ino_rec() helpers
> > to assist scanning the in-core inode records for records with at
> > least one free inode. These will be used to determine what records
> > are included in the free inode btree.
> > 
> > Signed-off-by: Brian Foster <bfoster@xxxxxxxxxx>
> > ---
> >  repair/incore.h | 27 +++++++++++++++++++++++++++
> >  1 file changed, 27 insertions(+)
> > 
> > diff --git a/repair/incore.h b/repair/incore.h
> > index 5419884..5f8c188 100644
> > --- a/repair/incore.h
> > +++ b/repair/incore.h
> > @@ -381,6 +381,33 @@ void			clear_uncertain_ino_cache(xfs_agnumber_t agno);
> >  		((ino_tree_node_t *) ((ino_node_ptr)->avl_node.avl_forw))
> >  
> >  /*
> > + * finobt helpers
> > + */
> > +static inline ino_tree_node_t *
> > +findfirst_free_inode_rec(xfs_agnumber_t agno)
> > +{
> > +	ino_tree_node_t *ino_rec;
> > +
> > +	ino_rec = findfirst_inode_rec(agno);
> > +
> > +	while (ino_rec && !ino_rec->ir_free)
> > +		ino_rec = next_ino_rec(ino_rec);
> > +
> > +	return ino_rec;
> > +}
> > +
> > +static inline ino_tree_node_t *
> > +next_free_ino_rec(ino_tree_node_t *ino_rec)
> > +{
> > +	ino_rec = next_ino_rec(ino_rec);
> > +
> > +	while (ino_rec && !ino_rec->ir_free)
> > +		ino_rec = next_ino_rec(ino_rec);
> > +
> > +	return ino_rec;
> > +}
> 
> That looks a bit inefficient - walking the list of inode records to
> find the next one with free inodes. Perhaps we woul dbe better
> served by adding a new list for inode records with free entries and
> walking that instead?
> 
> Iknow, it's a memory vs speed tradeoff, but if we've got millions of
> inodes and very few free, the difference could be very significant.
> 

Hmm, perhaps. The ino_tree_node_t structure is 104 bytes according to
pahole. With 64 inodes per node, 1 billion inodes would turn into around
15 million of these structures. If I write a dumb little program to
allocate, initialize and iterate 15 million ~100 byte structures
(accessing the structure beginning and end), it uses ~1.8GB RAM and runs
in under 5 seconds. If I increase that to 30 million, it allocates
~3.5GB RAM, triggers kswapd (4GB RAM VM) and seems to run anywhere from
30s to 1m30s.

All of that probably doesn't mean that much in the context of repair, I
guess. ;) I'll have to experiment a bit more with this and, at minimum,
I'll include some data in my next rev. FWIW, if it is a problem I'd
probably approach this as an additional optimization patch rather than
rework this one. I also note that we already do multiple scans of the
inode record list (e.g., cursor initialization), so we might be able to
buy back some of whatever the finobt degradation is by condensing the
cursor initialization scans.

Brian

> Cheers,
> 
> Dave.
> -- 
> Dave Chinner
> david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs