On 05/05/2023 17:27, Guilherme G. Piccoli wrote:
On 05/05/2023 02:16, Anand Jain wrote:
[cut]
I'll defer a more detailed response for John / Vivek / Ludovico, that are aware of the use case in a detail level I'm not, since they designed the installation / update path from the ground up.
The OS images are entirely independent. The goal is that you could completely corrupt slot A and it would have no impact on the bootability of slot B. So, yes, we sacrifice space but as a trade off we get robustness which is more important to us. ========================================================================= When a new OS image is delivered, the normal flow is this (simplified): While booted on slot A (for example) the update process is started. Our client fetches the most recent image from the update server. This is delivered as a block level diff between the image you have and the image you want. The partitions that are allocated to slot B have the new data written into them. As a final step, the root fs of the new slot is mounted and a couple of initialisation steps are completed (mostly writing config into the common boot partition: The slot B partitions contents are not modified as a result of this). The system is rebooted. If all goes well slot B is booted and becomes the primary (current) image. If it fails for some reason, the bootloader will (either automatically or by user intervention) go back to booting slot A. Note that other than the final mount to update the common boot partition with information about the new image we don't care at all about the contents or even the type of the filesystems we have delivered (and even then all we care about is that we _can_ mount it, not what it is). =========================================================================== Now normally this is not a problem: If the new image is not the same as the current one we will have written entirely new filesystems into the B partitions and there is no conflict. However if the user wishes or needs to reinstall a fresh copy of the _current_ image (for whatever reason: maybe the current image is damaged in some way and they need to so a factory reset) then with btrfs in the mix this breaks down: Since btrfs won't (at present) tolerate a second fs with the same fsuuid we have to check that the user is not installing the same image on both slots. If the user has a broken image which is also the latest release and needs to recover we have to artificially select an _older_ image, put that on slot B. boot into that, then the user needs to boot that and upgrade _again_ to get a repaired A slot. This sort of works but isn't a great user experience and introduces an artificial restriction - suddenly the images _do_ affect one another. If the user subverts our safety checks (or we mess up and put the same image on both slots) then suddenly the whole system becomes unbootable which is less than ideal. Hope that clarifies the situation and explains why we care.