On Fri, Oct 14, 2022 at 04:41:35PM +0800, Wu Guanghao wrote: > We noticed that systemd has an issue about symlink unreliable caused by > formatting filesystem and systemd operating on same device. > Issue Link: https://github.com/systemd/systemd/issues/23746 > > According to systemd doc, a BSD flock needs to be acquired before > formatting the device. > Related Link: https://systemd.io/BLOCK_DEVICE_LOCKING/ TLDR: udevd wants fs utilities to use advisory file locking to coordinate (re)writes to block devices to avoid collisions between mkfs and all the udev magic. Critically, udev calls flock(LOCK_SH | LOCK_NB) to trylock the device in shared mode to avoid blocking on fs utilities; if the trylock fails, they'll move on and try again later. The old O_EXCL-on-blockdevs trick will not work for that usecase (I guess) because it's not a shared reader lock. It's also not the file locking API. > So we acquire flock after opening the device but before > writing superblock. xfs_db and xfs_repair can write to the filesystem too; shouldn't this locking apply to them as well? > Signed-off-by: wuguanghao <wuguanghao3@xxxxxxxxxx> > --- > mkfs/xfs_mkfs.c | 26 ++++++++++++++++++++++++++ > 1 file changed, 26 insertions(+) > > diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c > index 9dd0e79c..b83cb043 100644 > --- a/mkfs/xfs_mkfs.c > +++ b/mkfs/xfs_mkfs.c > @@ -13,6 +13,7 @@ > #include "libfrog/crc32cselftest.h" > #include "proto.h" > #include <ini.h> > +#include <sys/file.h> > > #define TERABYTES(count, blog) ((uint64_t)(count) << (40 - (blog))) > #define GIGABYTES(count, blog) ((uint64_t)(count) << (30 - (blog))) > @@ -2758,6 +2759,30 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n" > > } > > +static void > +lock_device(dev_t dev, int flag, char *name) > +{ > + int fd = libxfs_device_to_fd(dev); > + int readonly = flag & LIBXFS_ISREADONLY; > + > + if (!readonly && fd > 0) > + if (flock(fd, LOCK_EX) != 0) { > + fprintf(stderr, "%s: failed to get lock.\n", name); > + exit(1); > + } So yes, this belongs in libxfs_device_open. If we're opening the bdevs in readonly mode, shouldn't we take LOCK_SH to prevent mkfs from colliding with (say) xfs_metadump? Bonus question: Shouldn't the /kernel/ also effectively be taking LOCK_SH when it opens the bdevs to mount the filesystem? --D > +} > + > +static void > +lock_devices(struct libxfs_xinit *xi) > +{ > + if (!xi->disfile) > + lock_device(xi->ddev, xi->dcreat, xi->dname); > + if (xi->logdev && !xi->lisfile) > + lock_device(xi->logdev, xi->lcreat, xi->logname); > + if (xi->rtdev && !xi->risfile) > + lock_device(xi->rtdev, xi->rcreat, xi->rtname); > +} > + > static void > open_devices( > struct mkfs_params *cfg, > @@ -4208,6 +4233,7 @@ main( > * Open and validate the device configurations > */ > open_devices(&cfg, &xi); > + lock_devices(&xi); > validate_overwrite(dfile, force_overwrite); > validate_datadev(&cfg, &cli); > validate_logdev(&cfg, &cli, &logfile); > -- > 2.27.0