On Mon, Mar 16, 2020 at 08:59:54PM +0000, Ober, Frank wrote: > Hi, Intel is looking into does it make sense to take an existing, > popular filesystem and patch it for write atomicity at the sector > count level. Meaning we would protect a configured number of sectors > using parameters that each layer in the kernel would synchronize on. > We could use a parameter(s) for this that comes from the NVMe > specification such as awun or awunpf <gesundheit> Oh, that was an acronym... > that set across the (affected) > layers to a user space program such as innodb/MySQL which would > benefit as would other software. The MySQL target is a strong use > case, as its InnoDB has a double write buffer that could be removed if > write atomicity was protected at 16KiB for the file opens and with > fsync(). We probably need a better elaboration of the exact usecases of atomic writes since I haven't been to LSF in a couple of years (and probably not this year either). I can think of a couple of access modes off the top of my head: 1) atomic directio write where either you stay under the hardware atomic write limit and we use it, or... 2) software atomic writes where we use the xfs copy-on-write mechanism to stage the new blocks and later map them back into the inode, where "later" is either an explicit fsync or an O_SYNC write or something... 3) ...or a totally separate interface where userspace does something along the lines of: write_fd = stage_writes(fd); which creates an O_TMPFILE and reflinks all of fd's content to it write(write_fd...); err = commit_writes(write_fd, fd); which then uses extent remapping to push all the changed blocks back to the original file if it hasn't changed. Bonus: other threads don't see the new data until commit_writes() finishes, and we can introduce new log items to make sure that once we start committing we can finish it even if the system goes down. > My question is why hasn't xfs write atomicity advanced further, as it > seems in 3.x kernel time a few years ago this was tried but nothing > committed. as documented here: > > http://git.infradead.org/users/hch/vfs.git/shortlog/refs/heads/O_ATOMIC > > Is xfs write atomicity still being pursued , and with what design > objective. There is a long thread here, > https://lwn.net/Articles/789600/ on write atomicity, but with no > progress, lots of ideas in there but not any progress, but I am > unclear. > > Is my design idea above simply too simplistic, to try and protect a > configured block size (sector count) through the filesystem and block > layers, and what really is not making it attainable? Lack of developer time, AFAICT. --D > Thanks for the feedback > Frank Ober