Re: Splitting an iomap_page

Matthew Wilcox <willy@xxxxxxxxxxxxx> · Tue, 8 Sep 2020 12:43:28 +0100

On Mon, Sep 07, 2020 at 02:33:24PM +0300, Kirill A. Shutemov wrote:
> On Fri, Sep 04, 2020 at 04:37:24AM +0100, Matthew Wilcox wrote:
> > Kirill, do I have the handling of split_huge_page() failure correct?
> > It seems reasonable to me -- unlock the page and drop the reference,
> > hoping that somebody else will not have a reference to the page by the
> > next time we try to split it.  Or they will split it for us.  There's a
> > livelock opportunity here, but I'm not sure it's worse than the one in
> > a holepunch scenario.
> 
> The worst case scenario is when the page is referenced (directly or
> indirectly) by the caller. It this case we would end up with endless loop.
> I'm not sure how we can guarantee that this will never happen.

I don't see a way for that to happen at the moment.  We're pretty
careful not to take references on multiple pages at once in these paths.
I've fixed the truncate paths to only take one reference per THP too.

I was thinking that if livelock becomes a problem, we could (ab)use the
THP destructor mechanism somewhat like this:

Add
	[TRANSHUGE_PAGE_SPLIT] = split_transhuge_page,
to the compound_page_dtors array.

New function split_huge_page_wait() which, if !can_split_huge_page()
first checks if the dtor is already set to TRANSHUGE_PAGE_SPLIT.  If so,
it returns to its caller, reporting failure (so that it will put its
reference to the page).  Then it sets the dtor to TRANSHUGE_PAGE_SPLIT
and sets page refcount to 1.  It goes to sleep on the page.

split_transhuge_page() first has to check if the refcount went to 0
due to mapcount being reduced.  If so, it resets the refcount to 1 and
returns to the caller.  If not, it freezes the page and wakes the task
above which is sleeping in split_huge_page_wait().

It's complicated and I don't love it.  But it might solve livelock, should
we need to do it.  It wouldn't prevent us from an indefinite wait if the
caller of split_huge_page_wait() has more than one reference to this page.
That's better than a livelock though.