On Tue, 28 Apr 2020 12:43:45 -0700 Dave Hansen <dave.hansen@xxxxxxxxx> wrote: > On 4/21/20 2:31 PM, Dave Hansen wrote: > > On 4/16/20 12:02 PM, Dave Hansen wrote: > >> On 4/16/20 9:34 AM, Claudio Imbrenda wrote: > >>>> Ahh, so this is *just* intended to precede I/O done on the page, > >>>> when a non-host entity is touching the memory? > >>> yep > >> OK, so we've got to do an action that precedes *all* I/O to a page. > >> That's not too bad. > >> > >> I still don't understand how this could work generally, though > >> There are lots of places where I/O is done to a page without > >> either going through __test_set_page_writeback() or gup() with > >> FOLL_PIN set. > >> > >> sendfile() is probably the best example of this: > >> > >> fd = open("/normal/ext4/file", O_RDONLY); > >> sendfile(socket_fd, fd, &off, count); > >> > >> There's no gup in sight since the file doesn't have an address and > >> it's not being written to so there's no writeback. > >> > >> How does sendfile work? > > > > Did you manage to see if sendfile works (or any other operation that > > DMAs file-backed data without being preceded by a gup)? > > It's been a couple of weeks with no response on this. sorry, I've been busy with things > From where I'm standing, we have a hook in the core VM that can't > possibly work with some existing kernel functionality and has > virtually no chance of getting used on a second architecture. it seems to work at least for us, so it does possibly work :) regarding second architectures: when we started sending these patches around, there has been interest from some other architectures, so just because nobody else needs them now, it doesn't mean nobody will use them ever. Moreover this is the only way for us to reasonably implement this (see below). > It sounds like there may need to be some additional work here, but > should these hooks stay in for 5.7? Or, should we revert this patch > and try again for 5.8? I don't see why we should revert a patch that works as intended and poses no overhead for non-users, whereas reverting it would break functionality. Now let me elaborate a little on the DMA API. There are some issues with some of the bus types used on s390 when it comes to the DMA API. Most I/O instructions on s390 need to allocate some small control blocks for each operation, and those need to be under 2GB. Those control blocks will be accessed directly by the hardware. The rest of the actual I/O buffers have no restriction and can be anywhere (64 bits). Setting the DMA mask to 2GB means that all other buffers will be almost always bounced, which is unacceptable. Especially since there are no bounce buffers set up for s390x hosts anyway (they are set up only in protected guests (and not in normal guests), so this was also introduced quite recently). Also notice that, until now, there has been no actual need to use the DMA API on most s390 device drivers, hence why it's not being used there. I know that you are used to need the DMA API for DMA operations otherwise Bad Things™ happen, but this is not the case on s390 (for non-PCI devices at least). So until the DMA API is fixed, there is no way to convert all the drivers to the DMA API (which would be quite a lot of work in itself already, but that's not the point here). A fix for the DMA API was actually proposed by my colleague Halil several months ago now, but it did not go through. His proposal was to allow architectures to override the GFP flags for DMA allocations, to allow allocating some buffers from some areas and some other buffers from other areas. I hope this clarifies the matter a little :)