Re: [PATCH 2/2] exportfs: fix 32-bit nfsd handling of 64-bit inode numbers

"J. Bruce Fields" <bfields@xxxxxxxxxxxx> · Fri, 11 Oct 2013 17:53:51 -0400

On Fri, Oct 11, 2013 at 09:28:07AM +1100, Dave Chinner wrote:
> On Wed, Oct 09, 2013 at 10:53:20AM -0400, J. Bruce Fields wrote:
> > On Wed, Oct 09, 2013 at 11:16:31AM +1100, Dave Chinner wrote:
> > > On Tue, Oct 08, 2013 at 05:56:56PM -0400, J. Bruce Fields wrote:
> > > > On Fri, Oct 04, 2013 at 06:15:22PM -0400, J. Bruce Fields wrote:
> > > > > On Fri, Oct 04, 2013 at 06:12:16PM -0400, bfields wrote:
> > > > > > On Wed, Oct 02, 2013 at 05:28:14PM -0400, J. Bruce Fields wrote:
> > > > > > > @@ -268,6 +268,16 @@ static int get_name(const struct path *path, char *name, struct dentry *child)
> > > > > > >  	if (!dir->i_fop)
> > > > > > >  		goto out;
> > > > > > >  	/*
> > > > > > > +	 * inode->i_ino is unsigned long, kstat->ino is u64, so the
> > > > > > > +	 * former would be insufficient on 32-bit hosts when the
> > > > > > > +	 * filesystem supports 64-bit inode numbers.  So we need to
> > > > > > > +	 * actually call ->getattr, not just read i_ino:
> > > > > > > +	 */
> > > > > > > +	error = vfs_getattr_nosec(path, &stat);
> > > > > > 
> > > > > > Doh, "path" here is for the parent....  The following works better!
> > > > > 
> > > > > By the way, I'm testing this with:
> > > > > 
> > > > > 	- create a bunch of nested subdirectories, use
> > > > > 	  name_to_fhandle_at to get a handle for the bottom directory.
> > > > > 	- echo 2 >/proc/sys/vm/drop_caches
> > > > > 	- open_by_fhandle_at on the filehandle
> > > > > 
> > > > > But this only actually exercises the reconnect path on the first run
> > > > > after boot.  Is there something obvious I'm missing here?
> > > > 
> > > > Looking at the code....  OK, most of the work of drop_caches is done by
> > > > shrink_slab_node, which doesn't actually try to free every single thing
> > > > that it could free--in particular, it won't try to free anything if it
> > > > thinks there are less than shrinker->batch_size (1024 in the
> > > > super_block->s_shrink case) objects to free.
> > 
> > (Oops, sorry, that should have been "less than half of
> > shrinker->batch_size", see below.)
> > 
> > > That's not quite right. Yes, the shrinker won't be called if the
> > > calculated scan count is less than the batch size, but the left over
> > > is added back the shrinker scan count to carry over to the next call
> > > to the shrinker. Hence if you repeated call the shrinker on a small
> > > cache with a large batch size, it will eventually aggregate the scan
> > > counts to over the batch size and trim the cache....
> > 
> > No, in shrink_slab_count, we do this:
> > 
> > 	if (total_scan > max_pass * 2)
> > 		total_scan = max_pass * 2;
> > 
> > 	while (total_scan >= batch_size) {
> > 		...
> > 	}
> > 
> > where max_pass is the value returned from count_objects.  So as long as
> > count_objects returns less than half batch_size, nothing ever happens.
> 
> Ah, right - I hadn't considered what that does to small caches - the
> intended purpose of that is to limit the scan size when caches are
> extremely large and lots of deferral has occurred. Perhaps we need
> to consider the batch size in this? e.g.:
> 
> 	total_scan = min(total_scan, max(max_pass * 2, batch_size));
> 
> Hence for small caches (max_pass <<< batch_size), it evaluates as:
> 
> 	total_scan = min(total_scan, batch_size);
> 
> and hence once aggregation of repeated calls pushes us over the
> batch size we run the shrinker.
> 
> For large caches (max_pass >>> batch_size), it evaluates as:
> 
> 	total_scan = min(total_scan, max_pass * 2);
> 
> which gives us the same behaviour as the current code.
> 
> I'll write up a patch to do this...

It all feels a bit ad-hoc, but OK.

drop_caches could still end up leaving some small caches alone, right?

I hadn't expected that, but then again maybe I don't really understand
what drop_caches is for.

--b.
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html