On 05/07/2024 05:32, Dave Chinner wrote: > On Fri, Jul 05, 2024 at 12:56:28AM +0100, Matthew Wilcox wrote: >> On Fri, Jul 05, 2024 at 08:06:51AM +1000, Dave Chinner wrote: >>>>> It seems strange to silently clamp these? Presumably for the bs>ps usecase, >>>>> whatever values are passed in are a hard requirement? So wouldn't want them to >>>>> be silently reduced. (Especially given the recent change to reduce the size of >>>>> MAX_PAGECACHE_ORDER to less then PMD size in some cases). >>>> >>>> Hm, yes. We should probably make this return an errno. Including >>>> returning an errno for !IS_ENABLED() and min > 0. >>> >>> What are callers supposed to do with an error? In the case of >>> setting up a newly allocated inode in XFS, the error would be >>> returned in the middle of a transaction and so this failure would >>> result in a filesystem shutdown. >> >> I suggest you handle it better than this. If the device is asking for a >> blocksize > PMD_SIZE, you should fail to mount it. A detail, but MAX_PAGECACHE_ORDER may be smaller than PMD_SIZE even on systems with CONFIG_TRANSPARENT_HUGEPAGE as of a fix that is currently in mm-unstable: #ifdef CONFIG_TRANSPARENT_HUGEPAGE #define PREFERRED_MAX_PAGECACHE_ORDER HPAGE_PMD_ORDER #else #define PREFERRED_MAX_PAGECACHE_ORDER 8 #endif /* * xas_split_alloc() does not support arbitrary orders. This implies no * 512MB THP on ARM64 with 64KB base page size. */ #define MAX_XAS_ORDER (XA_CHUNK_SHIFT * 2 - 1) #define MAX_PAGECACHE_ORDER min(MAX_XAS_ORDER, PREFERRED_MAX_PAGECACHE_ORDER) But that also implies that the page cache can handle up to order-8 without CONFIG_TRANSPARENT_HUGEPAGE so sounds like there isn't a dependcy on CONFIG_TRANSPARENT_HUGEPAGE in this respect? > > That's my point: we already do that. > > The largest block size we support is 64kB and that's way smaller > than PMD_SIZE on all platforms and we always check for bs > ps > support at mount time when the filesystem bs > ps. > > Hence we're never going to set the min value to anything unsupported > unless someone makes a massive programming mistake. At which point, > we want a *hard, immediate fail* so the developer notices their > mistake immediately. All filesystems and block devices need to > behave this way so the limits should be encoded as asserts in the > function to trigger such behaviour. > >> If the device is >> asking for a blocksize > PAGE_SIZE and CONFIG_TRANSPARENT_HUGEPAGE is >> not set, you should also decline to mount the filesystem. > > What does CONFIG_TRANSPARENT_HUGEPAGE have to do with filesystems > being able to use large folios? > > If that's an actual dependency of using large folios, then we're at > the point where the mm side of large folios needs to be divorced > from CONFIG_TRANSPARENT_HUGEPAGE and always supported. > Alternatively, CONFIG_TRANSPARENT_HUGEPAGE needs to selected by the > block layer and also every filesystem that wants to support > sector/blocks sizes larger than PAGE_SIZE. IOWs, large folio > support needs to *always* be enabled on systems that say > CONFIG_BLOCK=y. > > I'd much prefer the former occurs, because making the block layer > and filesystems dependent on an mm feature they don't actually use > is kinda weird... > > -Dave. >