On Tue, Feb 27, 2024 at 11:42:01AM +0000, Daniel Gomez wrote: > On Tue, Feb 20, 2024 at 01:39:05PM +0100, Jan Kara wrote: > > On Tue 20-02-24 10:26:48, Daniel Gomez wrote: > > > On Mon, Feb 19, 2024 at 02:15:47AM -0800, Hugh Dickins wrote: > > > I'm uncertain when we may want to be more elastic. In the case of XFS with iomap > > > and support for large folios, for instance, we are 'less' elastic than here. So, > > > what exactly is the rationale behind wanting shmem to be 'more elastic'? > > > > Well, but if you allocated space in larger chunks - as is the case with > > ext4 and bigalloc feature, you will be similarly 'elastic' as tmpfs with > > large folio support... So simply the granularity of allocation of > > underlying space is what matters here. And for tmpfs the underlying space > > happens to be the page cache. > > But it seems like the underlying space 'behaves' differently when we talk about > large folios and huge pages. Is that correct? And this is reflected in the fstat > st_blksize. The first one is always based on the host base page size, regardless > of the order we get. The second one is always based on the host huge page size > configured (at the moment I've tested 2MiB, and 1GiB for x86-64 and 2MiB, 512 > MiB and 16GiB for ARM64). Apologies, I was mixing the values available in HugeTLB and those supported in THP (pmd-size only). Thus, it is 2MiB for x86-64, and 2MiB, 32 MiB and 512 MiB for ARM64 with 4k, 16k and 64k Base Page Size, respectively. > > If that is the case, I'd agree this is not needed for huge pages but only when > we adopt large folios. Otherwise, we won't have a way to determine the step/ > granularity for seeking data/holes as it could be anything from order-0 to > order-9. Note: order-1 support currently in LBS v1 thread here [1]. > > Regarding large folios adoption, we have the following implementations [2] being > sent to the mailing list. Would it make sense then, to have this block tracking > for the large folios case? Notice that my last attempt includes a partial > implementation of block tracking discussed here. > > [1] https://lore.kernel.org/all/20240226094936.2677493-2-kernel@xxxxxxxxxxxxxxxx/ > > [2] shmem: high order folios support in write path > v1: https://lore.kernel.org/all/20230915095042.1320180-1-da.gomez@xxxxxxxxxxx/ > v2: https://lore.kernel.org/all/20230919135536.2165715-1-da.gomez@xxxxxxxxxxx/ > v3 (RFC): https://lore.kernel.org/all/20231028211518.3424020-1-da.gomez@xxxxxxxxxxx/ > > > > > > If we ever move shmem to large folios [1], and we use them in an oportunistic way, > > > then we are going to be more elastic in the default path. > > > > > > [1] https://lore.kernel.org/all/20230919135536.2165715-1-da.gomez@xxxxxxxxxxx > > > > > > In addition, I think that having this block granularity can benefit quota > > > support and the reclaim path. For example, in the generic/100 fstest, around > > > ~26M of data are reported as 1G of used disk when using tmpfs with huge pages. > > > > And I'd argue this is a desirable thing. If 1G worth of pages is attached > > to the inode, then quota should be accounting 1G usage even though you've > > written just 26MB of data to the file. Quota is about constraining used > > resources, not about "how much did I write to the file". > > But these are two separate values. I get that the system wants to track how many > pages are attached to the inode, so is there a way to report (in addition) the > actual use of these pages being consumed? > > > > > Honza > > -- > > Jan Kara <jack@xxxxxxxx> > > SUSE Labs, CR