Re: [PATCH 1/2] btrfs: Introduce the virtual_fsid feature

Goffredo Baroncelli <kreijack@xxxxxxxxx> · Sat, 6 May 2023 19:30:24 +0200

On 06/05/2023 00.31, Qu Wenruo wrote:

On 2023/5/6 01:34, Goffredo Baroncelli wrote:
On 05/05/2023 09.21, Qu Wenruo wrote:

I would prefer a much simpler but more explicit method.

Just introduce a new compat_ro feature, maybe call it SINGLE_DEV.

It is not clear to me if we need that.

I don't understand in what checking for SINGLE_DEV is different from
btrfs_super_block.disks_num == 1.

Because disks_num == 1 doesn't exclude the ability to add new disks in the future.

Without that new SINGLE_DEV compat_ro, we should still do the regular device scan.

Let me to argument:

I see two scenarios:
1) mount two different fs with the same UUID NOT at the same time:
This could be done now with small change in the kernel:
- we need to NOT store the data of a filesystem when a disk is
   scanned IF it is composed by only one disk
- after the unmount we need to discard the data too (checking again
   that the filesystem is composed by only one disk)

No limit is needed to add/replace a disk. Of course after a disk is
added a filesystem with the same UUID cannot be mounted without a
full cycle of --forget.

The problem is, what if:

- Both btrfs have single disk
- Both btrfs have the same fsid
- Both btrfs have been mounted
- Then one of btrfs is going to add a new disk

Why the user should be prevented to add a disk. It may
a aware user that want to do that, knowing the possible consequence.

[...]

- Scan and record the fsid/device at device add time
   This means we should reject the device add.
   This can sometimes cause confusion to the end user, just because they
   have mounted another fs, now they can not add a new device.

I agree about the confusion. But not about the cause.
The confusion is due to the poor communication between the kernel (where the error is
detected) and the user. Now the only solution is to look at dmesg.

Allowing to mount two filesystem with the same UUID is technically possible.
There are some constraints bat are well defined; there are some corner case
but are well defined (like add a device to a single device filesystem).

However when we hit one of these corner case, now it is difficult to inform
the user about the problem. Because now the user has to look at the dmesg
to understand what is the problem.

This is the real problem. The communication. And we have a lot of these
problem (like mount a multi device filesystem without some disk, or with a
brain slip problem, or better inform the user if it is possible the
mount -o degraded).

Look this in another way; what if we had a mount.btrfs helper that:

- look for the devices which compose the filesystem at mounting time
- check if these devices are consistent:
	- if the fs is one-device, we don't need further check; otherwise check
	- if all the devices are present
	- if all the device have the same transaction id
	- if ...
  if any of the check above fails, write an error message; otherwise
- register the device(s) in the kernel or (better) pass it in the mount command
  line
- finally mount the filesystem

No need of strange flag; all the corner case can be handle safely and avoid
any confusion to the user.

   And this is going to change device add code path quite hugely.
   We currently expects all device scan/trace thing done way before
   mount.
   Such huge change can lead to hidden bugs.

To me, neither is good to the end users.

A SINGLE_DEV feature would reject the corner case in a way more user-friendly and clear way.

   With SINGLE_DEV feature, just no dev add/replace/delete no matter
   what.

I have to point out that this problem would be easily solved in
userspace if we switch from the current model where the disks are
scanned asynchronously (udev which call btrfs dev scan) to a model
where the disk are scanned at mount time by a mount.btrfs helper.

A mount.btrfs helper, also could be a place to put some more clear error
message like "we cannot mount this filesystem because one disk of a
raid5 is missing, try passing -o degraded"
or "we cannot mount this filesystem because we detect a brain split
problem" ....

2) mount two different fs with the same UUID at the SAME time:
This is a bit more complicated; we need to store a virtual UUID
somewhere.

However sometime we need to use the real fsid (during a write),
and sometime we need to use the virtual_uuid (e.g. for /sys/fs/btrfs/<uuid>)

Another thing is, we already have too many uuids.

Some are unavoidable like fsid and device uuid.

But I still prefer not to add a new layer of unnecessary uuids.

Thanks,
Qu

Both in 1) and 2) we need to/it is enough to have btrfs_super_block.disks_num == 1
In the case 2) using a virtual_uuid mount option will prevent
to add a disk.

--
gpg @keyserver.linux.it: Goffredo Baroncelli <kreijackATinwind.it>
Key fingerprint BBF5 1610 0B64 DAC6 5F7D  17B2 0EDA 9B37 8B82 E0B5