Re: [PATCH v6 4/5] mm/migrate: skip migrating folios under writeback with AS_WRITEBACK_INDETERMINATE mappings

Joanne Koong <joannelkoong@xxxxxxxxx> · Fri, 27 Dec 2024 10:25:56 -0800

On Thu, Dec 26, 2024 at 2:44 PM Bernd Schubert
<bernd.schubert@xxxxxxxxxxx> wrote:
>
> On 12/23/24 20:00, Joanne Koong wrote:
> > On Sat, Dec 21, 2024 at 1:59 PM Bernd Schubert
> > <bernd.schubert@xxxxxxxxxxx> wrote:
> >>
> >>
> >>
> >> On 12/21/24 17:25, David Hildenbrand wrote:
> >>> On 20.12.24 22:01, Joanne Koong wrote:
> >>>> On Fri, Dec 20, 2024 at 6:49 AM David Hildenbrand <david@xxxxxxxxxx>
> >>>> wrote:
> >>>>>
> >>>>>>> I'm wondering if there would be a way to just "cancel" the
> >>>>>>> writeback and
> >>>>>>> mark the folio dirty again. That way it could be migrated, but not
> >>>>>>> reclaimed. At least we could avoid the whole
> >>>>>>> AS_WRITEBACK_INDETERMINATE
> >>>>>>> thing.
> >>>>>>>
> >>>>>>
> >>>>>> That is what I basically meant with short timeouts. Obviously it is not
> >>>>>> that simple to cancel the request and to retry - it would add in quite
> >>>>>> some complexity, if all the issues that arise can be solved at all.
> >>>>>
> >>>>> At least it would keep that out of core-mm.
> >>>>>
> >>>>> AS_WRITEBACK_INDETERMINATE really has weird smell to it ... we should
> >>>>> try to improve such scenarios, not acknowledge and integrate them, then
> >>>>> work around using timeouts that must be manually configured, and ca
> >>>>> likely no be default enabled because it could hurt reasonable use
> >>>>> cases :(
> >>>>>
> >>>>> Right now we clear the writeback flag immediately, indicating that data
> >>>>> was written back, when in fact it was not written back at all. I suspect
> >>>>> fsync() currently handles that manually already, to wait for any of the
> >>>>> allocated pages to actually get written back by user space, so we have
> >>>>> control over when something was *actually* written back.
> >>>>>
> >>>>>
> >>>>> Similar to your proposal, I wonder if there could be a way to request
> >>>>> fuse to "abort" a writeback request (instead of using fixed timeouts per
> >>>>> request). Meaning, when we stumble over a folio that is under writeback
> >>>>> on some paths, we would tell fuse to "end writeback now", or "end
> >>>>> writeback now if it takes longer than X". Essentially hidden inside
> >>>>> folio_wait_writeback().
> >>>>>
> >>>>> When aborting a request, as I said, we would essentially "end writeback"
> >>>>> and mark the folio as dirty again. The interesting thing is likely how
> >>>>> to handle user space that wants to process this request right now (stuck
> >>>>> in fuse_send_writepage() I assume?), correct?
> >>>>
> >>>> This would be fine if the writeback request hasn't been sent yet to
> >>>> userspace but if it has and the pages are spliced
> >>>
> >>> Can you point me at the code where that splicing happens?
> >>
> >> fuse_dev_splice_read()
> >>   fuse_dev_do_read()
> >>     fuse_copy_args()
> >>       fuse_copy_page
> >>
> >>
> >> Btw, for the non splice case, disabling migration should be
> >> only needed while it is copying to the userspace buffer?
> >
> > I don't think so. We don't currently disable migration when copying
> > to/from the userspace buffer for reads.
>
>
> Sorry for my late reply. I'm confused about "reads". This discussions
> is about writeback?

Whether we need to disable migration for copying to/from the userspace
buffers for non-tmp pages should be the same between handling reads or
writes, no? That's why I brought up reads, but looking more at how
fuse handles readahead and read_folio(), it looks like the folio's
lock is held while it's being copied out, and IIUC that's enough to
disable migration since migration will wait on the lock. So if we end
writeback on the non-tmp, it seems like we'd probably need to do
something similar first.

> Without your patches we have tmp-pages - migration disabled on these.
> With your patches we have AS_WRITEBACK_INDETERMINATE - migration
> also disabled?
>
> I think we have two code paths
>
> a) fuse_dev_read - does a full buffer copy. Why do we need tmp-pages
> for these at all? The only time migration must not run on these pages
> while it is copying to the userspace buffer?

The tmp pages were originally introduced for avoiding deadlock on
reclaim and avoiding hanging sync()s as well.

[1] https://lore.kernel.org/linux-kernel/bd49fcba-3eb6-4e84-a0f0-e73bce31ddb2@xxxxxxxxxxxxxxxxx/

>
> b) fuse_dev_splice_read - isn't this our real problem, as we don't
> know when pages in the pipe are getting consumed?

Yes, the splice case nixes the idea unfortunately. Everything else we
could find a workaround for, but there's no way I can see to avoid
this for splice

Thanks,
Joanne
>
>
> Thanks,
> Bernd
>