On Tue, Feb 28, 2017 at 03:09:40PM -0800, Darrick J. Wong wrote: > By the way, the copy on write code remembers the extents it has > allocated for CoW staging in the refcount btree so that it can free them > after a crash, which means that O_ATOMIC requires reflink to be enabled. Yeah. > There doesn't seem to be any explicit checking that reflink is even > enabled, which will probably just lead to weird crashes on a pre-reflink > xfs. True. I had this earlier when I hat basic O_ATOMIC validity checking, but that was dropped from the series I posted. > > FWIW I didn't see any checking anywhere (vfs or xfs) that the filesystem > can actually support O_ATOMIC. If the FS doesn't support atomic writes, > shouldn't the kernel send EINVAL or something back to userspace? Older kernels can't check it, so having new ones check it creates even more of a mess. I'm still not feeling very well about O_ATOMIC - either we need an open2 that checks for unknown flags, or I need to change this to a per-op flag - RWF_ATOMIC for write (pwritev2 actually), and MAP_ATOMIC for mmap. But given that pwritev2 isn't really supported in common userland yet that might be rather painful. > At the start of xfs_reflink.c is a long block comment describing how the > copy on write mechanism works. Since O_ATOMIC is a variant on CoW (it's > basically CoW with remapping deferred until fsync), please update the > comment so that the comments capture the details of how atomic writes > work. > > (IOWs: Dave asked me to leave the big comment, so I'm going to try to > keep it fairly up to date.) I'll add some information to it. > I suppose it goes without saying that userspace will have to coordinate > its O_ATOMIC writes to the file. It does - but if you have multiple writers to a file they really need to be coordinated anyway. If you have threads whose updates race you'd need something like open(O_TMPFILE) clone file (or range) into tempfile update tempfile clone region you want atomically inserted back into the original file. We can actually do that with existing primitives, but it's a bit more heavyweight. We could opimize this a bit by checking if an extent already points to the same physical blocks before replacing it in clone_file_range. > > + if (file->f_flags & O_ATOMIC) > > + printk_ratelimited("O_ATOMIC!\n"); > > Per above, > > if (file->f_flags & O_ATOMIC) { > if (!xfs_sb_version_hasreflink(...)) > return -EPROTONOSUPPORT; Yeah. > printk_ratelimited("EXPERIMENTAL atomic writes feature in use!\n"); And that should just go away - it was a local debug aid :)