On 15 May 2018 at 09:22, Christoph Hellwig <hch@xxxxxx> wrote: > On Tue, May 15, 2018 at 11:11:47AM +1000, Dave Chinner wrote: >> I'm not sure adding a per-page filesystem callout abstraction is >> what we want in the iomap code. The commit message does not explain >> why gfs2 needs per-page callouts, nor why the per-page work cannot >> be done/avoided by running code in the ->iomap_begin callout >> (e.g. by unstuffing the inode so nothing needs to be done per-page). > > Agreed. > >> My main concern is that the callouts to the filesystem in iomap are >> - by design - per IO, not per page. The whole point of using iomap >> was to get away from doing per-page filesystem operations on >> multi-page IOs - it's highly inefficient when we only need to call >> into the filesystem on a per-extent basis, and we simplified the >> code a *lot* by avoiding such behaviours. > > Yes. And from a quick look we can easily handle "stuffed" aka inline > data with the existing architecture. We just need a new IOMAP_INLINE > flag for the extent type. Details will need to be worked out how to > pass that data, either a struct page or virtual address should probably > do it. This is about data journaling, it has nothing to do with "stuffed" files per se. With journaled data writes, we need to do some special processing for each page so that the page ends up in the journal before it gets written back to its proper on-disk location. It just happens that since we need these per-page operations anyway, the "unstuffing" and "re-stuffing" nicely fits in those operations as well. > The other thing gfs2 seems to be doing is handling of journaled > data. I'm not quite sure how to fit that into the grand overall scheme > of things, but at least that would only be after the write. Not sure what you mean by "but at least that would only be after the write". >> That is, Christoph's patchset pushes us further away from >> filesystems doing their own per-page processing of page state. It >> converts the iomap IO path to store it's own per-page state tracking >> information in page->private, greatly reducing memory overhead (16 >> bytes on 4k page instead of 104 bytes per bufferhead) and > > Or in fact zero bytes for block size == PAGE_SIZE > >> streamlining the code. This, however, essentially makes the >> buffered IO path through the iomap infrastructure incompatible with >> existing bufferhead based filesystems. >> >> Depsite this, we can still provide limited support for buffered >> writes on bufferhead based filesystems through the iomap >> infrastructure, but I only see that as a mechanism for filesystems >> moving away from dependencies on bufferheads. It would require a >> different refactoring of the iomap code - I'd much prefer to keep >> bufferhead dependent IO paths completely separate (e.g. via a new >> actor function) to minimise impact and dependencies on the internal >> bufferhead free iomap IO path.... >> >> Let's see what Christoph thinks first, though.... > > gfs2 seems to indeed depend pretty heavily on buffer_heads. This is > a bit of a bummer, but I'd need to take a deeper look how to offer > iomap functionality to them without them moving away from buffer_heads > which isn't really going to be a quick project. In a way the > write_begin/write_end ops from Andreas facilitate that, as the > rest of iomap_file_buffered_write and iomap_write_actor are untouched. > > So at least temporarily his callouts are what we need, even if the > actual inline data functionality should be moved out of them for sure, Been there. An earlier version of the patches did have a separate stuffed write path and the unstuffing and re-stuffing didn't happen in those per-page operations. The result was much messier overall. > but only with a long term plan to move gfs2 away from buffer heads. I hope we'll get rid of buffer heads in additional parts of gfs2, but that's not going to happen soon. The main point of those per-page operations is still to support data journaling though. > Btw, does gfs2 support blocksizes smallers than PAGE_SIZE? Yes. Thanks, Andreas