On Wed, Jul 15 2015 at 8:15am -0400, Hannes Reinecke <hare@xxxxxxx> wrote: > On 07/15/2015 02:01 PM, James Bottomley wrote: > > On Wed, 2015-07-15 at 13:52 +0200, Hannes Reinecke wrote: > >> On 07/15/2015 01:35 PM, James Bottomley wrote: > >>> On Wed, 2015-07-15 at 13:23 +0200, Hannes Reinecke wrote: > >>>> If dm-mpath encounters an reservation conflict it should not > >>>> fail the path (as communication with the target is not affected) > >>>> but should rather retry on another path. > >>>> However, in doing so we might be inducing a ping-pong between > >>>> paths, with no guarantee of any forward progress. > >>>> And arguably a reservation conflict is an unexpected error, > >>>> so we should be passing it upwards to allow the application > >>>> to take appropriate steps. > >>> > >>> If I interpret the code correctly, you've changed the behaviour from the > >>> current try all paths and fail them, ultimately passing the reservation > >>> conflict up if all paths fail to return reservation conflict > >>> immediately, keeping all paths running. This assumes that the > >>> reservation isn't path specific because if we encounter a path specific > >>> reservation, you've altered the behaviour from route around to fail. > >>> > >> That is correct. > >> As mentioned in the path, the 'correct' solution would be to retry > >> the offending I/O on another path. > >> However, the current multipath design doesn't allow us to do that > >> without failing the path first. > >> If we were just retrying I/O on another path without failing the > >> path first (and all paths would return a reservation conflict) we > >> wouldn't know when we've exhausted all paths. > >> > >>> The case I think the original code was for is SAN Volume controllers > >>> which use path specific SCSI-3 reservations effectively to do traffic > >>> control and allow favoured paths. Have you verified that nothing we > >>> encounter in the enterprise uses path specific reservations for > >>> multipath shaping any more? > >>> > >> Ah. That was some input I was looking for. > >> With that patch I've assumed that persistent reservations are done > >> primarily from userland / filesystem, where the reservation would > >> effectively be done on a per-LUN basis. > >> If it's being used from the storage array internally this is a > >> different matter. > >> (Although I'd be very interested how this behaviour would play > >> together with applications which use persistent reservations > >> internally; GPFS springs to mind here ...) > >> > >> But apparently this specific behaviour wasn't seen that often in the > >> field; I certainly never got any customer reports about mysteriously > >> failing paths. > > > > Have you already got this patch in SLES, if so, for how long? > > > We haven't as of yet; I've come across this behaviour due to another > issue. And before I were to put this into SLES I thought I should be > asking those in the know ... persistent reservations _is_ an arcane > topic, after all. > I was just referring to the fact that I rarely got customer issues > with persistent reservations; and those I get tend to be tape-centric. > > >> Anyway. I'll see if I can come up with something to restore the > >> original behaviour. > > > > Or a way of verifying that nothing in the current enterprise uses path > > specific reservations ... we can change the current behaviour as long > > as nothing notices. > > > The only instance I know of is GPFS; someone in our company once > wrote an HA agent using persistent reservations, but I'm not sure if > it's deployed anywhere. But that agent is certainly aware of > multipathing, and doesn't issue per-path reservations. > (Well, actually it does, but it does it for every path :-) > I would think the same goes for GPFS. > > Incidentally, the SVC docs have a section about persistent > reservations, but do not mention anything about internal use. > So if it does it'll be opaque to the user, otherwise I would assume > it to be mentioned there. The main consumer of SCSI PR that I'm aware of is fence_scsi. I don't have specifics on whether the Clustering layers that use fence_scsi (e.g. pacemaker) ever make use of per-path SCSI PR (cc'ing Ryan O'hara who AFAIK mainatins fence_scsi). Mike -- To unsubscribe from this list: send the line "unsubscribe linux-scsi" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html