Re: [PATCH] dm-verity: restart or panic on an I/O error

Sami Tolvanen <samitolvanen@xxxxxxxxxx> · Thu, 26 Sep 2024 10:32:47 -0700

On Tue, Sep 24, 2024 at 11:35 PM Milan Broz <gmazyland@xxxxxxxxx> wrote:
>
> On 9/25/24 8:09 AM, Maxim Suhanov wrote:
> > Hello.
> >
> >> This is a very strange reasoning. I can say that restarting on an IO error
> >> (that can happen in normal situations) could cause another security issue,
> >> such as DoS. EIO is not a data integrity error; it can happen even higher
> >> in the storage stack... and the application should handle it anyway.
> >
> > In the default mode of operation, there should be no panic/reboot on
> > an I/O error.
> >
> > The issue and the proposed patch affect the non-default modes:
> > panic_on_corruption and restart_on_corruption.
> > Users relying on one of those two modes expect the system to crash if
> > something goes wrong [1]. So, DoS is expected here. Returning I/O
> > errors to the reader isn't something that is expected here (see [1]).
> >
> > The issue allows the attacker to "downgrade" the "panic_on_corruption"
> > and "restart_on_corruption" modes to the default one by creating
> > unreadable sectors on the underlying media (e.g., through the Write
> > Uncorrectable command).
>
> Sorry, but I do agree with this.
>
> Panic/Restart should happen on data corruption (or failed Reed-Solomon
> FEC data recovery fail). IO error (and other errors that basically translates
> to this) can mean something else and userspace application must process
> it in error path (think about network failure).
>
> This is how is the dm-verity flag is documented:
>
> restart_on_corruption
>      Restart the system when a corrupted block is discovered. This option is
>      not compatible with ignore_corruption and requires user space support to
>      avoid restart loops.
>
> It says nothing about restart on other error, that can come from crypto, network
> or whatever. You are redefining the policy here.

I thought about this a bit more, and I agree with Milan. I/O errors
can be temporary and applications should be expected to handle them.
IMO we should trigger a restart/panic only if the underlying device
has corrupted data. If there's a use case for restarting on I/O errors
too, there should be a separate flag for that.

Android devices, which I assume are the largest users of this
functionality, are expected to switch dm-verity partitions back to
normal mode after the first restart is triggered, and userspace is
therefore expected to handle I/O errors gracefully anyway.

https://source.android.com/docs/security/features/verifiedboot/boot-flow

Sami