Re: Problem with [PATCH] libmultipath: fix max_sectors_kb on adding path

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 4/12/24 09:21, Martin Wilck wrote:
On Fri, 2024-04-12 at 08:06 +0200, Hannes Reinecke wrote:

We have gone into great pains in the kernel to ensure the queue
limits
are sane, and updated correctly. Even for stacking devices.

This is true, but only for the creation of stacked devices (table
activation, as far as device mapper is concerned). Admins are free to
change max_sectors_kb any time; there's no propagation of changed
settings along the device stack, and no sanity checking in the kernel
prevents them from setting values that will cause I/O errors.

I sinserely doubt we need this patch from multipath anymore.
Having to adjust max_sectors_kb really should be reserved for
corner-cases where the user has a dodgy hardware which doesn't
report correct limits.

Right. We've seen a couple of cases where decreasing max_sectors_kb
from the default value was the only remedy for weird I/O failures. This
happened with remote storage reporting wrong limits, misbehaving
elements in the fabric, and even with virtualized IO stacks.

But even that should rather be handled by blacklisting.
Can't we just set max_sectors_kb to readonly in the kernel and
be done with it?

Personally, I think this goes a bit too far. I believe the kernel
should disallow changing (more specifically, decreasing) the
max_sectors_kb sysfs attribute for block devices that are either in use
(bd_openers > 0) or held by other block devices (bd_holder != NULL).
That would eliminate a large portion of bad cases, AFAICS. Admins could
still increase max_sectors_kb at the top of the device stack, but that
would arguably count as shooting oneself into the foot.

Errors in valid configurations are possible, even without changing
max_sectors_kb in sysfs. Consider a multipath map consisting of devices
with different max_sectors (for example mixed iSCSI/tcp and
iSCSI/bnx2i). If only the paths with large max_sectors are initially
detected, and others are added later, the map's max_sectors will be
decreased while in use, and the change will not be propagated to
stacked block layers above multipath: bummer. The only way to avoid
this in general is implementing limit propagation. I assume that the
implementation of block limit propagation in the kernel would be a
major effort with lots of possible race conditions. It's far easier to
have admins simply impose max_sectors_kb on multipath maps in corner
case scenarios like this.

Indeed; changing the max_sectors_kb while the queue is live should
be preceeded with a quiesce, to ensure that we're not changing request
limits in flight.

And disabling setting of max_sectors_kb for stacked devices is a good
idea, too.

Cheers,

Hannes





[Index of Archives]     [DM Crypt]     [Fedora Desktop]     [ATA RAID]     [Fedora Marketing]     [Fedora Packaging]     [Fedora SELinux]     [Yosemite Discussion]     [KDE Users]     [Fedora Docs]

  Powered by Linux