Re: [PATCH v4 2/2] mm/gup/writeback: add callbacks for inaccessible pages

Claudio Imbrenda <imbrenda@xxxxxxxxxxxxx> · Wed, 29 Apr 2020 01:39:55 +0200

On Tue, 28 Apr 2020 12:43:45 -0700
Dave Hansen <dave.hansen@xxxxxxxxx> wrote:

> On 4/21/20 2:31 PM, Dave Hansen wrote:
> > On 4/16/20 12:02 PM, Dave Hansen wrote:  
> >> On 4/16/20 9:34 AM, Claudio Imbrenda wrote:  
> >>>> Ahh, so this is *just* intended to precede I/O done on the page,
> >>>> when a non-host entity is touching the memory?  
> >>> yep  
> >> OK, so we've got to do an action that precedes *all* I/O to a page.
> >> That's not too bad.
> >>
> >> I still don't understand how this could work generally, though
> >> There are lots of places where I/O is done to a page without
> >> either going through __test_set_page_writeback() or gup() with
> >> FOLL_PIN set.
> >>
> >> sendfile() is probably the best example of this:
> >>
> >> 	fd = open("/normal/ext4/file", O_RDONLY);
> >> 	sendfile(socket_fd, fd, &off, count);
> >>
> >> There's no gup in sight since the file doesn't have an address and
> >> it's not being written to so there's no writeback.
> >>
> >> How does sendfile work?  
> > 
> > Did you manage to see if sendfile works (or any other operation that
> > DMAs file-backed data without being preceded by a gup)?  
> 
> It's been a couple of weeks with no response on this.

sorry, I've been busy with things

> From where I'm standing, we have a hook in the core VM that can't
> possibly work with some existing kernel functionality and has
> virtually no chance of getting used on a second architecture.

it seems to work at least for us, so it does possibly work :)

regarding second architectures: when we started sending these patches
around, there has been interest from some other architectures, so
just because nobody else needs them now, it doesn't mean nobody will
use them ever. Moreover this is the only way for us to reasonably
implement this (see below).

> It sounds like there may need to be some additional work here, but
> should these hooks stay in for 5.7?  Or, should we revert this patch
> and try again for 5.8?

I don't see why we should revert a patch that works as intended and
poses no overhead for non-users, whereas reverting it would break
functionality.

Now let me elaborate a little on the DMA API. There are some issues
with some of the bus types used on s390 when it comes to the DMA API.
Most I/O instructions on s390 need to allocate some small control blocks
for each operation, and those need to be under 2GB. Those control blocks
will be accessed directly by the hardware. The rest of the actual I/O
buffers have no restriction and can be anywhere (64 bits). 
Setting the DMA mask to 2GB means that all other buffers will be
almost always bounced, which is unacceptable. Especially since there are
no bounce buffers set up for s390x hosts anyway (they are set up only in
protected guests (and not in normal guests), so this was also introduced
quite recently).

Also notice that, until now, there has been no actual need to use the
DMA API on most s390 device drivers, hence why it's not being used
there. I know that you are used to need the DMA API for DMA operations
otherwise Bad Things™ happen, but this is not the case on s390 (for
non-PCI devices at least).

So until the DMA API is fixed, there is no way to convert all the
drivers to the DMA API (which would be quite a lot of work in itself
already, but that's not the point here). A fix for the DMA API was
actually proposed by my colleague Halil several months ago now, but it
did not go through. His proposal was to allow architectures to override
the GFP flags for DMA allocations, to allow allocating some buffers
from some areas and some other buffers from other areas.

I hope this clarifies the matter a little :)