Re: [RFC PATCH 0/9] shmem: fix llseek in hugepages

Daniel Gomez <da.gomez@xxxxxxxxxxx> · Wed, 28 Feb 2024 15:50:08 +0000

On Tue, Feb 27, 2024 at 11:42:01AM +0000, Daniel Gomez wrote:
> On Tue, Feb 20, 2024 at 01:39:05PM +0100, Jan Kara wrote:
> > On Tue 20-02-24 10:26:48, Daniel Gomez wrote:
> > > On Mon, Feb 19, 2024 at 02:15:47AM -0800, Hugh Dickins wrote:
> > > I'm uncertain when we may want to be more elastic. In the case of XFS with iomap
> > > and support for large folios, for instance, we are 'less' elastic than here. So,
> > > what exactly is the rationale behind wanting shmem to be 'more elastic'?
> > 
> > Well, but if you allocated space in larger chunks - as is the case with
> > ext4 and bigalloc feature, you will be similarly 'elastic' as tmpfs with
> > large folio support... So simply the granularity of allocation of
> > underlying space is what matters here. And for tmpfs the underlying space
> > happens to be the page cache.
> 
> But it seems like the underlying space 'behaves' differently when we talk about
> large folios and huge pages. Is that correct? And this is reflected in the fstat
> st_blksize. The first one is always based on the host base page size, regardless
> of the order we get. The second one is always based on the host huge page size
> configured (at the moment I've tested 2MiB, and 1GiB for x86-64 and 2MiB, 512
> MiB and 16GiB for ARM64).

Apologies, I was mixing the values available in HugeTLB and those supported in
THP (pmd-size only). Thus, it is 2MiB for x86-64, and 2MiB, 32 MiB and 512 MiB
for ARM64 with 4k, 16k and 64k Base Page Size, respectively.

> 
> If that is the case, I'd agree this is not needed for huge pages but only when
> we adopt large folios. Otherwise, we won't have a way to determine the step/
> granularity for seeking data/holes as it could be anything from order-0 to
> order-9. Note: order-1 support currently in LBS v1 thread here [1].
> 
> Regarding large folios adoption, we have the following implementations [2] being
> sent to the mailing list. Would it make sense then, to have this block tracking
> for the large folios case? Notice that my last attempt includes a partial
> implementation of block tracking discussed here.
> 
> [1] https://lore.kernel.org/all/20240226094936.2677493-2-kernel@xxxxxxxxxxxxxxxx/
> 
> [2] shmem: high order folios support in write path
> v1: https://lore.kernel.org/all/20230915095042.1320180-1-da.gomez@xxxxxxxxxxx/
> v2: https://lore.kernel.org/all/20230919135536.2165715-1-da.gomez@xxxxxxxxxxx/
> v3 (RFC): https://lore.kernel.org/all/20231028211518.3424020-1-da.gomez@xxxxxxxxxxx/
> 
> > 
> > > If we ever move shmem to large folios [1], and we use them in an oportunistic way,
> > > then we are going to be more elastic in the default path.
> > > 
> > > [1] https://lore.kernel.org/all/20230919135536.2165715-1-da.gomez@xxxxxxxxxxx
> > > 
> > > In addition, I think that having this block granularity can benefit quota
> > > support and the reclaim path. For example, in the generic/100 fstest, around
> > > ~26M of data are reported as 1G of used disk when using tmpfs with huge pages.
> > 
> > And I'd argue this is a desirable thing. If 1G worth of pages is attached
> > to the inode, then quota should be accounting 1G usage even though you've
> > written just 26MB of data to the file. Quota is about constraining used
> > resources, not about "how much did I write to the file".
> 
> But these are two separate values. I get that the system wants to track how many
> pages are attached to the inode, so is there a way to report (in addition) the
> actual use of these pages being consumed?
> 
> > 
> > 								Honza
> > -- 
> > Jan Kara <jack@xxxxxxxx>
> > SUSE Labs, CR