Re: [PATCH] dm-verity: restart or panic on an I/O error

Mikulas Patocka <mpatocka@xxxxxxxxxx> · Thu, 26 Sep 2024 22:44:31 +0200 (CEST)

On Thu, 26 Sep 2024, Sami Tolvanen wrote:

> On Tue, Sep 24, 2024 at 11:35 PM Milan Broz <gmazyland@xxxxxxxxx> wrote:
> >
> > On 9/25/24 8:09 AM, Maxim Suhanov wrote:
> > > Hello.
> > >
> > >> This is a very strange reasoning. I can say that restarting on an IO error
> > >> (that can happen in normal situations) could cause another security issue,
> > >> such as DoS. EIO is not a data integrity error; it can happen even higher
> > >> in the storage stack... and the application should handle it anyway.
> > >
> > > In the default mode of operation, there should be no panic/reboot on
> > > an I/O error.
> > >
> > > The issue and the proposed patch affect the non-default modes:
> > > panic_on_corruption and restart_on_corruption.
> > > Users relying on one of those two modes expect the system to crash if
> > > something goes wrong [1]. So, DoS is expected here. Returning I/O
> > > errors to the reader isn't something that is expected here (see [1]).
> > >
> > > The issue allows the attacker to "downgrade" the "panic_on_corruption"
> > > and "restart_on_corruption" modes to the default one by creating
> > > unreadable sectors on the underlying media (e.g., through the Write
> > > Uncorrectable command).
> >
> > Sorry, but I do agree with this.
> >
> > Panic/Restart should happen on data corruption (or failed Reed-Solomon
> > FEC data recovery fail). IO error (and other errors that basically translates
> > to this) can mean something else and userspace application must process
> > it in error path (think about network failure).
> >
> > This is how is the dm-verity flag is documented:
> >
> > restart_on_corruption
> >      Restart the system when a corrupted block is discovered. This option is
> >      not compatible with ignore_corruption and requires user space support to
> >      avoid restart loops.
> >
> > It says nothing about restart on other error, that can come from crypto, network
> > or whatever. You are redefining the policy here.
> 
> I thought about this a bit more, and I agree with Milan. I/O errors
> can be temporary and applications should be expected to handle them.
> IMO we should trigger a restart/panic only if the underlying device
> has corrupted data. If there's a use case for restarting on I/O errors
> too, there should be a separate flag for that.

See for example openssh, the function read_config_file_depth. There is:

while (getline(&line, &linesize, f) != -1) {
	... process_config_line_depth
}
free(line);
fclose(f)
if (bad_options > 0)
	fatal("%s: terminating, %d bad configuration options",
		filename, bad_options);A
return 1;

So, the function doesn't distinguish between error and eof. If reading the 
config file returns -EIO, the function exits with 1 as if the file was 
empty.

If an attacker can trigger -EIO on some file in /etc/ssh/sshd_config.d/, 
openssh would behave as if the file didn't exist. If the file contains 
some security-related settings, they are lost.

Another example - if the iptables binary can't be executed due to -EIO, 
would the system erroneously boot with active networking without firewall? 
Do the boot scripts really propagate the iptables error so that no network 
devices are activated if iptables failed? I don't know how systemd handles 
this, but the old sysvinit scripts didn't seem to propagate the error.

I understand that some users want to prevent applications from seeing 
-EIO. Testing the whole system for -EIO is hard.

Mikulas