Re: [PATCH] mkfs: acquire flock before modifying the device superblock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




在 2022/10/14 23:38, Darrick J. Wong 写道:
> On Fri, Oct 14, 2022 at 04:41:35PM +0800, Wu Guanghao wrote:
>> We noticed that systemd has an issue about symlink unreliable caused by
>> formatting filesystem and systemd operating on same device.
>> Issue Link: https://github.com/systemd/systemd/issues/23746
>>
>> According to systemd doc, a BSD flock needs to be acquired before
>> formatting the device.
>> Related Link: https://systemd.io/BLOCK_DEVICE_LOCKING/
> 
> TLDR: udevd wants fs utilities to use advisory file locking to
> coordinate (re)writes to block devices to avoid collisions between mkfs
> and all the udev magic.
> 
> Critically, udev calls flock(LOCK_SH | LOCK_NB) to trylock the device in
> shared mode to avoid blocking on fs utilities; if the trylock fails,
> they'll move on and try again later.  The old O_EXCL-on-blockdevs trick
> will not work for that usecase (I guess) because it's not a shared
> reader lock.  It's also not the file locking API.
> 
>> So we acquire flock after opening the device but before
>> writing superblock.
> 
> xfs_db and xfs_repair can write to the filesystem too; shouldn't this
> locking apply to them as well?
> 
xfs_db is an interactive operation.If a lock is added, the lock may be held
for too long. xfs_repair only repairs the data inside the file system ,so it's
unlikely to conflict with systemd. So these two commands aren't locked.

>> Signed-off-by: wuguanghao <wuguanghao3@xxxxxxxxxx>
>> ---
>>  mkfs/xfs_mkfs.c | 26 ++++++++++++++++++++++++++
>>  1 file changed, 26 insertions(+)
>>
>> diff --git a/mkfs/xfs_mkfs.c b/mkfs/xfs_mkfs.c
>> index 9dd0e79c..b83cb043 100644
>> --- a/mkfs/xfs_mkfs.c
>> +++ b/mkfs/xfs_mkfs.c
>> @@ -13,6 +13,7 @@
>>  #include "libfrog/crc32cselftest.h"
>>  #include "proto.h"
>>  #include <ini.h>
>> +#include <sys/file.h>
>>
>>  #define TERABYTES(count, blog) ((uint64_t)(count) << (40 - (blog)))
>>  #define GIGABYTES(count, blog) ((uint64_t)(count) << (30 - (blog)))
>> @@ -2758,6 +2759,30 @@ _("log stripe unit (%d bytes) is too large (maximum is 256KiB)\n"
>>
>>  }
>>
>> +static void
>> +lock_device(dev_t dev, int flag, char *name)
>> +{
>> +       int fd = libxfs_device_to_fd(dev);
>> +       int readonly = flag & LIBXFS_ISREADONLY;
>> +
>> +       if (!readonly && fd > 0)
>> +               if (flock(fd, LOCK_EX) != 0) {
>> +                       fprintf(stderr, "%s: failed to get lock.\n", name);
>> +                       exit(1);
>> +               }
> 
> So yes, this belongs in libxfs_device_open.
> 
> If we're opening the bdevs in readonly mode, shouldn't we take LOCK_SH
> to prevent mkfs from colliding with (say) xfs_metadump?
> 
> Bonus question: Shouldn't the /kernel/ also effectively be taking
> LOCK_SH when it opens the bdevs to mount the filesystem?

Systemd normally uses "watch" to monitor disks, only in special cases
will the monitoring be released. During the time from the release of
monitoring to the re-opening of monitoring, the flock is used to
ensure that the disk won't be written to by others.
So if the disk isn't modified or the modified content won't trigger
the udev rule, then it should be OK not to lock.

There is still a problem with this solution, systemd only lock the main
block device, not the partition device. So if we're operating on a
partitioned device, the lock won't work. Currently we are still
communicating with systemd.

> --D
> 
>> +}
>> +
>> +static void
>> +lock_devices(struct libxfs_xinit *xi)
>> +{
>> +       if (!xi->disfile)
>> +               lock_device(xi->ddev, xi->dcreat, xi->dname);
>> +       if (xi->logdev && !xi->lisfile)
>> +               lock_device(xi->logdev, xi->lcreat, xi->logname);
>> +       if (xi->rtdev && !xi->risfile)
>> +               lock_device(xi->rtdev, xi->rcreat, xi->rtname);
>> +}
>> +
>>  static void
>>  open_devices(
>>         struct mkfs_params      *cfg,
>> @@ -4208,6 +4233,7 @@ main(
>>          * Open and validate the device configurations
>>          */
>>         open_devices(&cfg, &xi);
>> +       lock_devices(&xi);
>>         validate_overwrite(dfile, force_overwrite);
>>         validate_datadev(&cfg, &cli);
>>         validate_logdev(&cfg, &cli, &logfile);
>> --
>> 2.27.0
> .
> 



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux