New link flags to request "atomic" link. Signed-off-by: Amir Goldstein <amir73il@xxxxxxxxx> --- Hi Guys, Following our discussions on LSF/MM and beyond [1][2], here is an RFC documentation patch. Ted, I know we discussed limiting the API for linking an O_TMPFILE to avert the hardlinks issue, but I decided it would be better to document the hardlinks non-guaranty instead. This will allow me to replicate the same semantics and documentation to renameat(2). Let me know how that works out for you. I also decided to try out two separate flags for data and metadata. I do not find any of those flags very useful without the other, but documenting them seprately was easier, because of the fsync/fdatasync reference. In the end, we are trying to solve a social engineering problem, so this is the least confusing way I could think of to describe the new API. First implementation of AT_ATOMIC_METADATA is expected to be noop for xfs/ext4 and probably fsync for btrfs. First implementation of AT_ATOMIC_DATA is expected to be filemap_write_and_wait() for xfs/ext4 and probably fdatasync for btrfs. Thoughts? Amir. [1] https://lwn.net/Articles/789038/ [2] https://lore.kernel.org/linux-fsdevel/CAOQ4uxjZm6E2TmCv8JOyQr7f-2VB0uFRy7XEp8HBHQmMdQg+6w@xxxxxxxxxxxxxx/ man2/link.2 | 51 +++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 51 insertions(+) diff --git a/man2/link.2 b/man2/link.2 index 649ba00c7..15c24703e 100644 --- a/man2/link.2 +++ b/man2/link.2 @@ -184,6 +184,57 @@ See .BR openat (2) for an explanation of the need for .BR linkat (). +.TP +.BR AT_ATOMIC_METADATA " (since Linux 5.x)" +By default, a link operation followed by a system crash, may result in the +new file name being linked with old inode metadata, such as out dated time +stamps or missing extended attributes. +One way to prevent this is to call +.BR fsync (2) +before linking the inode, but that involves flushing of volatile disk caches. + +A filesystem that accepts this flag will guaranty, that old inode metadata +will not be exposed in the new linked name. +Some filesystems may internally perform +.BR fsync (2) +before linking the inode to provide this guaranty, +but often, filesystems will have a more efficient method to provide this +guaranty without flushing volatile disk caches. + +A filesystem that accepts this flag does +.BR NOT +guaranty that the new file name will exist after a system crash, nor that the +current inode metadata is persisted to disk. +Specifically, if a file has hardlinks, the existance of the linked name after +a system crash does +.BR NOT +guaranty that any of the other file names exist, nor that the last observed +value of +.I st_nlink +(see +.BR stat (2)) +has persisted. +.TP +.BR AT_ATOMIC_DATA " (since Linux 5.x)" +By default, a link operation followed by a system crash, may result in the +new file name being linked with old data or missing data. +One way to prevent this is to call +.BR fdatasync (2) +before linking the inode, but that involves flushing of volatile disk caches. + +A filesystem that accepts this flag will guaranty, that old data +will not be exposed in the new linked name. +Some filesystems may internally perform +.BR fsync (2) +before linking the inode to provide this guaranty, +but often, filesystems will have a more efficient method to provide this +guaranty without flushing volatile disk caches. + +A filesystem that accepts this flag does +.BR NOT +guaranty that the new file name will exist after a system crash, nor that the +current inode data is persisted to disk. +.TP .SH RETURN VALUE On success, zero is returned. On error, \-1 is returned, and -- 2.17.1