On 2023/2/9 0:04, Jan Kara wrote: > On Sun 29-01-23 05:06:47, Matthew Wilcox wrote: >> On Sat, Jan 28, 2023 at 08:46:45PM -0800, Luis Chamberlain wrote: >>> I'm hoping this *might* be useful to some, but I fear it may leave quite >>> a bit of folks with more questions than answers as it did for me. And >>> hence I figured that *this aspect of this topic* perhaps might be a good >>> topic for LSF. The end goal would hopefully then be finally enabling us >>> to document IOMAP API properly and helping with the whole conversion >>> effort. >> >> +1 from me. >> >> I've made a couple of abortive efforts to try and convert a "trivial" >> filesystem like ext2/ufs/sysv/jfs to iomap, and I always get hung up on >> what the semantics are for get_block_t and iomap_begin(). > > Yeah, I'd be also interested in this discussion. In particular as a > maintainer of part of these legacy filesystems (ext2, udf, isofs). > >>> Perhaps fs/buffers.c could be converted to folios only, and be done >>> with it. But would we be loosing out on something? What would that be? >> >> buffer_heads are inefficient for multi-page folios because some of the >> algorthims are O(n^2) for n being the number of buffers in a folio. >> It's fine for 8x 512b buffers in a 4k page, but for 512x 4kb buffers in >> a 2MB folio, it's pretty sticky. Things like "Read I/O has completed on >> this buffer, can I mark the folio as Uptodate now?" For iomap, that's a >> scan of a 64 byte bitmap up to 512 times; for BHs, it's a loop over 512 >> allocations, looking at one bit in each BH before moving on to the next. >> Similarly for writeback, iirc. >> >> So +1 from me for a "How do we convert 35-ish block based filesystems >> from BHs to iomap for their buffered & direct IO paths". There's maybe a >> separate discussion to be had for "What should the API be for filesystems >> to access metadata on the block device" because I don't believe the >> page-cache based APIs are easy for fs authors to use. > > Yeah, so the actual data paths should be relatively easy for these old > filesystems as they usually don't do anything special (those that do - like > reiserfs - are deprecated and to be removed). But for metadata we do need > some convenience functions like - give me block of metadata at this block > number, make it dirty / clean / uptodate (block granularity dirtying & > uptodate state is absolute must for metadata, otherwise we'll have data > corruption issues). From the more complex functionality we need stuff like: > lock particular block of metadata (equivalent of buffer lock), track that > this block is metadata for given inode so that it can be written on > fsync(2). Then more fancy filesystems like ext4 also need to attach more > private state to each metadata block but that needs to be dealt with on > case-by-case basis anyway. > Hello, all. I also interested in this topic, especially for the ext4 filesystem iomap conversion of buffered IO paths. And also for the discussion of the metadata APIs, current buffer_heads could lead to many potential problems and brings a lot of quality challenges to our products. I look forward to more discussion if I can attend offline. Thanks, Yi. >> Maybe some related topics are >> "What testing should we require for some of these ancient filesystems?" >> "Whose job is it to convert these 35 filesystems anyway, can we just >> delete some of them?" > > I would not certainly miss some more filesystems - like minix, sysv, ... > But before really treatening to remove some of these ancient and long > untouched filesystems, we should convert at least those we do care about. > When there's precedent how simple filesystem conversion looks like, it is > easier to argue about what to do with the ones we don't care about so much. > >> "Is there a lower-performance but easier-to-implement API than iomap >> for old filesystems that only exist for compatibiity reasons?" > > As I wrote above, for metadata there ought to be something as otherwise it > will be real pain (and no gain really). But I guess the concrete API only > matterializes once we attempt a conversion of some filesystem like ext2. > I'll try to have a look into that, at least the obvious preparatory steps > like converting the data paths to iomap. > > Honza >