On Tue, Nov 10, 2020 at 10:55:06AM -0800, Darrick J. Wong wrote: > When we're wanting to use a ZONE_APPEND command, the @iomap structure > has to have IOMAP_F_ZONE_APPEND set in iomap->flags, iomap->type is set > to IOMAP_MAPPED, but what should iomap->addr be set to? > > I gather from what I see in zonefs and the relevant NVME proposal that > iomap->addr should be set to the (byte) address of the zone we want to > append to? And if we do that, then bio->bi_iter.bi_sector will be set > to sector address of iomap->addr, right? Yes. > Then when the IO completes, the block layer sets bio->bi_iter.bi_sector > to wherever the drive told it that it actually wrote the bio, right? Yes. > If that's true, then that implies that need_zeroout must always be false > for an append operation, right? Does that also mean that the directio > request has to be aligned to an fs block and not just the sector size? I think so, yes. > Can userspace send a directio append that crosses a zone boundary? If > so, what happens if a direct append to a lower address fails but a > direct append to a higher address succeeds? Userspace doesn't know about zone boundaries. It can send I/O larger than a zone, but the file system has to split it into multiple I/Os just like when it has to cross and AG boundary in XFS. > I'm also vaguely wondering how to communicate the write location back to > the filesystem when the bio completes? btrfs handles the bio completion > completely so it doesn't have a problem, but for other filesystems > (cough future xfs cough) either we'd have to add a new callback for > append operations; or I guess everyone could hook the bio endio. > > Admittedly that's not really your problem, and for all I know hch is > already working on this. I think any non-trivial file system needs to override the bio completion handler for writes anyway, so this seems reasonable. It might be worth documenting, though.