Re: [PATCH linux-next] Fix shmem huge page failed to set F_SEAL_WRITE attribute problem

Hugh Dickins <hughd@xxxxxxxxxx> · Sat, 26 Feb 2022 19:00:37 -0800 (PST)

On Thu, 17 Feb 2022, Matthew Wilcox wrote:
> On Wed, Feb 16, 2022 at 05:25:17PM -0800, Hugh Dickins wrote:
> > On Wed, 16 Feb 2022, Mike Kravetz wrote:
> > > On 2/14/22 23:37, cgel.zte@xxxxxxxxx wrote:
...
> > > > @@ -39,8 +40,12 @@ static void memfd_tag_pins(struct xa_state *xas)
> > > >  	xas_for_each(xas, page, ULONG_MAX) {
> > > >  		if (xa_is_value(page))
> > > >  			continue;
> > > > +
> > > >  		page = find_subpage(page, xas->xa_index);
> > > > -		if (page_count(page) - page_mapcount(page) > 1)
> > > > +		count = page_count(page);
> > > > +		if (PageTransCompound(page))
> > > 
> > > PageTransCompound() is true for hugetlb pages as well as THP.  And, hugetlb
> > > pages will not have a ref per subpage as THP does.  So, I believe this will
> > > break hugetlb seal usage.
> > 
> > Yes, I think so too; and that is not the only issue with the patch
> > (I don't think page_mapcount is enough, I had to use total_mapcount).

Mike, we had the same instinctive reaction to seeing a PageTransCompound
check in code also exposed to PageHuge pages; but in fact that seems to
have worked correctly - those hugetlbfs pages are hard to predict!
But it was not working on pte maps of THPs.

> > 
> > It's a good find, and thank you WangYong for the report.
> > I found the same issue when testing my MFD_HUGEPAGE patch last year,
> > and devised a patch to fix it (and keep MFD_HUGETLB working) then; but
> > never sent that in because there wasn't time to re-present MFD_HUGEPAGE.
> > 
> > I'm currently retesting my patch: just found something failing which
> > I thought should pass; but maybe I'm confused, or maybe the xarray is
> > working differently now.  I'm rushing to reply now because I don't want
> > others to waste their own time on it.
> 
> I did change how the XArray works for THP recently.
> 
> Kirill's original patch stored:
> 
> 512: p
> 513: p+1
> 514: p+2
> ...
> 1023: p+511
> 
> A couple of years ago, I changed it to store:
> 
> 512: p
> 513: p
> 514: p
> ...
> 1023: p
> 
> And in January, Linus merged the commit which changes it to:
> 
> 512-575: p
> 576-639: (sibling of 512)
> 640-703: (sibling of 512)
> ...
> 960-1023: (sibling of 512)
> 
> That is, I removed a level of the tree and store sibling entries
> rather than duplicate entries.  That wasn't for fun; I needed to do
> that in order to make msync() work with large folios.  Commit
> 6b24ca4a1a8d has more detail and hopefully can inspire whatever
> changes you need to make to your patch.

Matthew, thanks for the very detailed info, you shouldn't have taken
so much trouble over it: I knew you had done something of that kind,
and yes, that's where my suspicion lay at the time of writing.  But
you'll be relieved to know that the patch I wrote before your changes
turned out to be unaffected, and just as valid after your changes.

"just found something failing which I thought should pass" was me
forgetting, again and again, just how limited are the allowed
possibilities for F_SEAL_WRITE when mmaps are outstanding.

One thinks that a PROT_READ, MAP_SHARED mapping would be allowed;
but of course all the memfds are automatically O_RDWR, so mprotect
(no sealing hook) allows it to be changed to PROT_READ|PROT_WRITE,
so F_SEAL_WRITE is forbidden on any MAP_SHARED mapping: only allowed
on MAP_PRIVATEs.

I'll now re-read the commit message I wrote before, update if necessary,
and then send to Andrew, asking him to replace the one in this thread.

Hugh