Re: [PATCH v10 01/10] fs: Allow fine-grained control of folio sizes

"Darrick J. Wong" <djwong@xxxxxxxxxx> · Wed, 17 Jul 2024 08:25:22 -0700

On Wed, Jul 17, 2024 at 03:12:51PM +0000, Pankaj Raghav (Samsung) wrote:
> > >>
> > >> This is really too much.  It's something that will never happen.  Just
> > >> delete the message.
> > >>
> > >>> +	if (max > MAX_PAGECACHE_ORDER) {
> > >>> +		VM_WARN_ONCE(1,
> > >>> +	"max order > MAX_PAGECACHE_ORDER. Setting max_order to MAX_PAGECACHE_ORDER");
> > >>> +		max = MAX_PAGECACHE_ORDER;
> > >>
> > >> Absolutely not.  If the filesystem declares it can support a block size
> > >> of 4TB, then good for it.  We just silently clamp it.
> > > 
> > > Hmm, but you raised the point about clamping in the previous patches[1]
> > > after Ryan pointed out that we should not silently clamp the order.
> > > 
> > > ```
> > >> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
> > >> whatever values are passed in are a hard requirement? So wouldn't want them to
> > >> be silently reduced. (Especially given the recent change to reduce the size of
> > >> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
> > > 
> > > Hm, yes.  We should probably make this return an errno.  Including
> > > returning an errno for !IS_ENABLED() and min > 0.
> > > ```
> > > 
> > > It was not clear from the conversation in the previous patches that we
> > > decided to just clamp the order (like it was done before).
> > > 
> > > So let's just stick with how it was done before where we clamp the
> > > values if min and max > MAX_PAGECACHE_ORDER?
> > > 
> > > [1] https://lore.kernel.org/linux-fsdevel/Zoa9rQbEUam467-q@xxxxxxxxxxxxxxxxxxxx/
> > 
> > The way I see it, there are 2 approaches we could take:
> > 
> > 1. Implement mapping_max_folio_size_supported(), write a headerdoc for
> > mapping_set_folio_order_range() that says min must be lte max, max must be lte
> > mapping_max_folio_size_supported(). Then emit VM_WARN() in
> > mapping_set_folio_order_range() if the constraints are violated, and clamp to
> > make it safe (from page cache's perspective). The VM_WARN()s can just be inline
> 
> Inlining with the `if` is not possible since:
> 91241681c62a ("include/linux/mmdebug.h: make VM_WARN* non-rvals")
> 
> > in the if statements to keep them clean. The FS is responsible for checking
> > mapping_max_folio_size_supported() and ensuring min and max meet requirements.
> 
> This is sort of what is done here but IIUC willy's reply to the patch,
> he prefers silent clamping over having WARNINGS. I think because we check
> the constraints during the mount time, so it should be safe to call
> this I guess?

That's my read of the situation, but I'll ask about it at the next thp
meeting if that helps.

> > 
> > 2. Return an error from mapping_set_folio_order_range() (and the other functions
> > that set min/max). No need for warning. No state changed if error is returned.
> > FS can emit warning on error if it wants.
> 
> I think Chinner was not happy with this approach because this is done
> per inode and basically we would just shutdown the filesystem in the
> first inode allocation instead of refusing the mount as we know about
> the MAX_PAGECACHE_ORDER even during the mount phase anyway.

I agree.  Filesystem-wide properties (e.g. fs blocksize) should cause
the mount to fail if the pagecache cannot possibly handle any file
blocks.  Inode-specific properties (e.g. the forcealign+notears write
work John Garry is working on) could error out of open() with -EIO, but
that's a specialty file property.

--D

> --
> Pankaj
>