Re: [PATCH] Documentation: document the design of iomap and how to port

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jun 11, 2024 at 09:12:13AM -0700, Christoph Hellwig wrote:
> On Sun, Jun 09, 2024 at 08:55:06AM -0700, Darrick J. Wong wrote:
> > HTML version here, text version below.
> 
> That is so much nicer than all the RST stuff..
> 
> >    iomap is a filesystem library for handling various filesystem
> >    operations that involves mapping of file's logical offset ranges
> >    to physical extents. This origins of this library is the file I/O
> >    path that XFS once used; it has now been extended to cover several
> >    other operations. The library provides various APIs for
> >    implementing various file and pagecache operations, such as:
> 
> Does anyone care about the origin?

I do; occasionally people who are totally new to iomap wonder why
suchandsuch works in the odd way it does, and I can point them at its
XFS origins.

> > 
> >        * Pagecache reads and writes
> > 
> >        * Folio write faults to the pagecache
> > 
> >        * Writeback of dirty folios
> > 
> >        * Direct I/O reads and writes
> > 
> >        * FIEMAP
> > 
> >        * lseek SEEK_DATA and SEEK_HOLE
> > 
> >        * swapfile activation
> 
> One useful bit might be that there are two layer in iomap.
> 
>  1) the very simply underlying layer in iter.c that just provides
>     a nicer iteration over logical file offsets
>  2) anything built on top.  That's the things mentioned above plus
>     DAX.
> 
> What is also kinda interesting as it keeps confusing people is that
> nothing in the iterator is block device specific.  In fact the DAX
> code now has no block device dependencies, as does the lseek and
> FIEMAP code.

<nod>

> Because of that it might make sense to split this document up a bit
> for the different layers and libraries.  Or maybe not if too many
> documents are too confusing.

Hmm.  The internal design is about ~400 lines of text.  The actual
operations iomap implements are another ~600 LoT, and the porting
guidelines at the end are about 140 LoT.  Maybe that's a reasonable
length for splitting?

> >          2. For each sub-unit of work...
> > 
> >               1. Revalidate the mapping and go back to (1) above, if
> >                  necessary
> 
> That's something only really done in the buffered write path.

Yeah.

> >    Each iomap operation will be covered in more detail below. This
> >    library was covered previously by an LWN article and a
> >    KernelNewbies page.
> 
> Maybe these are links in other formats, but if not this information
> isn't very useful.  Depending on how old that information is it
> probably isn't even with links.

There are links, which are visible in the HTML version.  Maybe I
should've run links in whichever mode spits out all the links as
footnotes.  (rst2html -> links is how I made the text version in the
first place).

> >    The filesystem returns the mappings via the following structure.
> >    For documentation purposes, the structure has been reordered to
> >    group fields that go together logically.
> 
> I don't think putting a different layout in here is a good idea.
> In fact duplicating the definition means it will be out of sync
> rather sooner than later.  Given that we have to deal with RST anyway
> we might as well want to pull this in as kerneldoc comments.
> And maybe reorder the actual definition while we're at it,
> as the version below still packs nicely.

Ok, I'll copy struct iomap as is.

> >      struct block_device          *bdev;
> >      struct dax_device            *dax_dev;
> >      void                         *inline_data;
> 
> Note: The could become a union these days.  I tried years ago
> before fully decoupling the DAX code and that didn't work,
> but we should be fine now.

You and Ritesh have both suggested that today.

> >        * type describes the type of the space mapping:
> > 
> >             * IOMAP_HOLE: No storage has been allocated. This type
> >               must never be returned in response to an IOMAP_WRITE
> >               operation because writes must allocate and map space,
> >               and return the mapping. The addr field must be set to
> >               IOMAP_NULL_ADDR. iomap does not support writing
> >               (whether via pagecache or direct I/O) to a hole.
> 
> ...
> 
> These should probably also be kerneldoc comments instead of being
> away from the definitions?

I don't like how kerneldoc makes it hard to associate iomap::type with
the IOMAP_* constants that go in it.  This would probably be ok for
::type if we turned it into an actual enum, but as C doesn't actually
have a bitset type, the only way to tell the reader which flags go where
is either strong namespacing (we blew it on that) or writing linking
text into the kerneldoc.

With this format I can lay out the document with relevant topics
adjacent and indented, so the association is obvious.  The oneline
comments in the header file can jog readers' memories, without us
needing to stuff a whole ton of documentation into a C header.

Besides, kerneldoc only tells the reader what the interfaces are, not
how all those pieces fit together.

> >             * IOMAP_F_XATTR: The mapping is for extended attribute
> >               data, not regular file data. This is only useful for
> >               FIEMAP.
> 
> .. and only used inside XFS.  Maybe we should look into killing it.

Yeah.

> >    These struct kiocb flags are significant for buffered I/O with
> >    iomap:
> > 
> >        * IOCB_NOWAIT: Only proceed with the I/O if mapping data are
> >          already in memory, we do not have to initiate other I/O, and
> >          we acquire all filesystem locks without blocking. Neither
> >          this flag nor its definition RWF_NOWAIT actually define what
> >          this flag means, so this is the best the author could come
> >          up with.
> 
> I don't think that's true.  But if it feels true to you submitting
> a patch to describe it better is probably more helpful than this.

I think Dave just told me off for this, so I'll probably replace the
whole section with what he and Jan wrote.

> >    iomap internally tracks two state bits per fsblock:
> > 
> >        * uptodate: iomap will try to keep folios fully up to date. If
> >          there are read(ahead) errors, those fsblocks will not be
> >          marked uptodate. The folio itself will be marked uptodate
> >          when all fsblocks within the folio are uptodate.
> > 
> >        * dirty: iomap will set the per-block dirty state when
> >          programs write to the file. The folio itself will be marked
> >          dirty when any fsblock within the folio is dirty.
> > 
> >    iomap also tracks the amount of read and write disk IOs that are
> >    in flight. This structure is much lighter weight than struct
> >    buffer_head.
> 
> Is this really something that should go into an API documentation?

Strictly speaking, no.  It should be in a separate internals document.

> Note that the structure not only is lighter weight than a buffer_head,
> but more importantly there are a lot less of them as there is only
> one per folio and not one per FSB.
> 
> >   Why Convert to iomap?
> 
> Make this a separate document?

I was pondering splitting these into two pieces:

Documentation/iomap/{design,porting}.rst

Though the porting guide is 10% of the document.  Maybe that's worth it.

--D




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [NTFS 3]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [NTFS 3]     [Samba]     [Device Mapper]     [CEPH Development]

  Powered by Linux