Re: [PATCH v8 01/10] fs: Allow fine-grained control of folio sizes

Ryan Roberts <ryan.roberts@xxxxxxx> · Fri, 5 Jul 2024 10:03:31 +0100

On 05/07/2024 05:32, Dave Chinner wrote:
> On Fri, Jul 05, 2024 at 12:56:28AM +0100, Matthew Wilcox wrote:
>> On Fri, Jul 05, 2024 at 08:06:51AM +1000, Dave Chinner wrote:
>>>>> It seems strange to silently clamp these? Presumably for the bs>ps usecase,
>>>>> whatever values are passed in are a hard requirement? So wouldn't want them to
>>>>> be silently reduced. (Especially given the recent change to reduce the size of
>>>>> MAX_PAGECACHE_ORDER to less then PMD size in some cases).
>>>>
>>>> Hm, yes.  We should probably make this return an errno.  Including
>>>> returning an errno for !IS_ENABLED() and min > 0.
>>>
>>> What are callers supposed to do with an error? In the case of
>>> setting up a newly allocated inode in XFS, the error would be
>>> returned in the middle of a transaction and so this failure would
>>> result in a filesystem shutdown.
>>
>> I suggest you handle it better than this.  If the device is asking for a
>> blocksize > PMD_SIZE, you should fail to mount it.

A detail, but MAX_PAGECACHE_ORDER may be smaller than PMD_SIZE even on systems
with CONFIG_TRANSPARENT_HUGEPAGE as of a fix that is currently in mm-unstable:

	#ifdef CONFIG_TRANSPARENT_HUGEPAGE
	#define PREFERRED_MAX_PAGECACHE_ORDER	HPAGE_PMD_ORDER
	#else
	#define PREFERRED_MAX_PAGECACHE_ORDER	8
	#endif

	/*
	 * xas_split_alloc() does not support arbitrary orders. This implies no
	 * 512MB THP on ARM64 with 64KB base page size.
	 */
	#define MAX_XAS_ORDER		(XA_CHUNK_SHIFT * 2 - 1)
	#define MAX_PAGECACHE_ORDER	min(MAX_XAS_ORDER,
					    PREFERRED_MAX_PAGECACHE_ORDER)

But that also implies that the page cache can handle up to order-8 without
CONFIG_TRANSPARENT_HUGEPAGE so sounds like there isn't a dependcy on
CONFIG_TRANSPARENT_HUGEPAGE in this respect?

> 
> That's my point: we already do that.
> 
> The largest block size we support is 64kB and that's way smaller
> than PMD_SIZE on all platforms and we always check for bs > ps 
> support at mount time when the filesystem bs > ps.
> 
> Hence we're never going to set the min value to anything unsupported
> unless someone makes a massive programming mistake. At which point,
> we want a *hard, immediate fail* so the developer notices their
> mistake immediately. All filesystems and block devices need to
> behave this way so the limits should be encoded as asserts in the
> function to trigger such behaviour.
> 
>> If the device is
>> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is
>> not set, you should also decline to mount the filesystem.
> 
> What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems
> being able to use large folios?
> 
> If that's an actual dependency of using large folios, then we're at
> the point where the mm side of large folios needs to be divorced
> from CONFIG_TRANSPARENT_HUGEPAGE and always supported.
> Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the
> block layer and also every filesystem that wants to support
> sector/blocks sizes larger than PAGE_SIZE.  IOWs, large folio
> support needs to *always* be enabled on systems that say
> CONFIG_BLOCK=y.
> 
> I'd much prefer the former occurs, because making the block layer
> and filesystems dependent on an mm feature they don't actually use
> is kinda weird...
> 
> -Dave.
>