On Sun, Jan 20, 2019 at 07:45:02AM -0500, Brian Foster wrote: > 4. Kind of a nit, but the comment update in xfs_bmapi_write() that > describes the ilock and associated race window and whatnot should really > be split between there and xfs_iomap_write_allocate(). The former should > just explain what exactly XFS_BMAPI_DELALLOC does (i.e., skip holes, > real extents, over a range..). The latter should explain how > the use of XFS_BMAPI_DELALLOC helps us deal with the race window in the > writeback code. Ok. > One option that comes to mind is to perhaps split off XFS_BMAPI_DELALLOC > from xfs_bmapi_write() into a new xfs_bmapi_delalloc() function that > facilitates new semantics (I'm not terribly comfortable with overloading > the semantics of xfs_bmapi_write()). Instead of passing a range to > xfs_bmapi_delalloc(), just pass the offset we care about and expect that > this function will attempt to allocate the entire extent currently > backing offset. (Alternatively, we could perhaps pass a range by value > and allow xfs_bmapi_delalloc() to update the range in the event of > delalloc discontiguity.) Either way, the extent returned may not cover > the offset (due to fragmentation, as today) and thus the caller needs to > iterate until that happens. The larger point is that we'd lookup the > current extent _at offset_ on each iteration and thus shouldn't ever > contend with new delalloc reservations. Thoughts? I considered splitting it off and even had an early prototype. I got something wrong and it didn't work, and it created a little too much duplication for my taste so I gave up on it for now. But fundamentally having the delalloc conversion separate from xfs_bmapi_write is the right thing. I'll just have to find some time for it or pass this work off to you..