Re: [PATCH 0/2] Supporting same fsid filesystems mounting on btrfs

Vivek Dasmohapatra <vivek@xxxxxxxxxxxxx> · Fri, 5 May 2023 19:15:49 +0100

On 05/05/2023 17:27, Guilherme G. Piccoli wrote:
On 05/05/2023 02:16, Anand Jain wrote:

[cut]

I'll defer a more detailed response for John / Vivek / Ludovico, that
are aware of the use case in a detail level I'm not, since they designed
the installation / update path from the ground up.

The OS images are entirely independent. The goal is that you could
completely corrupt slot A and it would have no impact on the bootability
of slot B.

So, yes, we sacrifice space but as a trade off we get robustness which
is more important to us.

=========================================================================

When a new OS image is delivered, the normal flow is this (simplified):

While booted on slot A (for example) the update process is started.

Our client fetches the most recent image from the update server.

This is delivered as a block level diff between the image you
have and the image you want.

The partitions that are allocated to slot B have the new data written
into them.

As a final step, the root fs of the new slot is mounted and a couple of
initialisation steps are completed (mostly writing config into the
common boot partition: The slot B partitions contents are not modified
as a result of this).

The system is rebooted. If all goes well slot B is booted and becomes
the primary (current) image.

If it fails for some reason, the bootloader will (either automatically
or by user intervention) go back to booting slot A.

Note that other than the final mount to update the common boot partition
with information about the new image we don't care at all about the
contents or even the type of the filesystems we have delivered (and even
then all we care about is that we _can_ mount it, not what it is).
===========================================================================

Now normally this is not a problem: If the new image is not the same as
the current one we will have written entirely new filesystems into
the B partitions and there is no conflict.

However if the user wishes or needs to reinstall a fresh copy of the
_current_ image (for whatever reason: maybe the current image is damaged
in some way and they need to so a factory reset) then with btrfs in the
mix this breaks down:

Since btrfs won't (at present) tolerate a second fs with the same fsuuid
we have to check that the user is not installing the same image on both
slots.

If the user has a broken image which is also the latest release and
needs to recover we have to artificially select an _older_ image, put
that on slot B. boot into that, then the user needs to boot that and
upgrade _again_ to get a repaired A slot.

This sort of works but isn't a great user experience and introduces an
artificial restriction - suddenly the images _do_ affect one another.

If the user subverts our safety checks (or we mess up and put the same
image on both slots) then suddenly the whole system becomes unbootable
which is less than ideal.

Hope that clarifies the situation and explains why we care.