Re: [PATCH] Protect against overflow in dev_loss_tmo

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Christof Schmitt wrote:
> On Tue, Mar 09, 2010 at 09:14:37AM -0500, James Smart wrote:
>> I don't ever expect to see large dev_loss_tmo values, but the patch is fine.
>>
>> Acked-by: James Smart <james.smart@xxxxxxxxxx>
>>
>> -- james s
>>
>>
>> Hannes Reinecke wrote:
>>> The rport structure defines dev_loss_tmo as u32, which is
>>> later multiplied with HZ to get the actual timeout value.
>>> This might overflow for large dev_loss_tmo values. So we
>>> should be better using u64 as intermediate variables here
>>> to protect against overflow.
>>>
>>> Signed-off-by: Hannes Reinecke <hare@xxxxxxx>
> [...]
> 
> I guess this is the intended use to prevent the dev_loss_tmo from
> removing the SCSI devices:
> http://git.kernel.org/?p=linux/kernel/git/hare/multipath-tools.git;a=commitdiff;h=b9903e2e8a6cdc5042897719fbae6c9346820bbf;hp=ed1dc6164fe530d146cfe65d4f99e44ec9b54b95
> 
> But does this raise the question again how to run SCSI EH with remote
> port failures?
> 
> The SCSI FC LLDs call fc_block_scsi_eh to wait until the fc_rport
> leaves the state FC_PORTSTATE_BLOCKED. This effectively prevents SCSI
> devices from being taken offline when there is a command timeout and
> the fc_rport is BLOCKED. With the large dev_loss_tmo, the dev_loss_tmo
> never expires and a problem with a single remote port can block the
> host error handler.
> 
I was under the impression that terminate_rport_io() would cancel/terminate
all outstanding I/O, while any new I/O would be blocked due to FC_PORTSTATE_BLOCKED.

A device would only be taken offline if the full error recovery is run,
something we cannot do (reliably) if the path is down, so from that
point of view it totally reasonable to defer the error recovery here.

However, I'm curious as how one could get into that state while
the port is blocked.
The only way I can imagine is that an I/O has started before the
port entered FC_PORTSTATE_BLOCKED, and would return (with error)
before terminate_rport_io has been called.
So eh would be delayed due to fc_block_scsi_eh().
But I would have assumed that terminate_rport_io() would terminate
even this failing I/O with DID_TRANSPORT_FAILFAST (or somesuch),
thus avoiding proper eh altogether.

Am I wrong here?

Cheers,

Hannes
-- 
Dr. Hannes Reinecke		      zSeries & Storage
hare@xxxxxxx			      +49 911 74053 688
SUSE LINUX Products GmbH, Maxfeldstr. 5, 90409 Nürnberg
GF: Markus Rex, HRB 16746 (AG Nürnberg)
--
To unsubscribe from this list: send the line "unsubscribe linux-scsi" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]
  Powered by Linux