Re: [LSF/MM TOPIC] linux servers as a storage server - what's missing?

Bart Van Assche <bvanassche@xxxxxxx> · Wed, 18 Jan 2012 18:51:37 +0000

On Wed, Jan 18, 2012 at 6:46 PM, Roland Dreier <roland@xxxxxxxxxxxxxxx> wrote:
> > Why would you crash is you have device mapper multipath configured to handle
> > path fail over? We have tons of enterprise customers that use that...
>
> cf http://www.spinics.net/lists/linux-scsi/msg56254.html
>
> Basically hot unplug of an sdX can oops on any recent kernel, no
> matter what dm stuff you have on top.
>
> > On the broader topic of error handling and so on, I do agree that is always
> > an area of concern (how many times to retry, how long time outs need to be,
> > when to panic/reboot or propagate up an error code)
>
> Yes, especially the scsi eh stuff escalating to a host reset when
> a single drive has gone bad -- even if the HBA is happily doing IO
> to other drives, we'll kill access to the whole SAS fabric.

With which SCSI low-level diver does that occur and how does the call
stack look like ? I haven't encountered any such issues while testing
the srp-ha patch set. However, I have to admit that the issues
mentioned in the description of commit 3308511 were discovered while
testing the srp-ha patch set.

Bart.
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html