Re: [PATCH RFC] nvme-fc: FPIN link integrity handling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]


On 07/03/2024 14:13, Hannes Reinecke wrote:
On 3/7/24 13:01, Sagi Grimberg wrote:

On 07/03/2024 13:29, Hannes Reinecke wrote:
On 3/7/24 11:10, Sagi Grimberg wrote:

On 19/02/2024 10:59, hare@xxxxxxxxxx wrote:
From: Hannes Reinecke <hare@xxxxxxx>

FPIN LI (link integrity) messages are received when the attached
fabric detects hardware errors. In response to these messages the
affected ports should not be used for I/O, and only put back into
service once the ports had been reset as then the hardware might
have been replaced.

Does this mean it cannot service any type of communication over
the wire?

It means that the service is impacted, and communication cannot be guaranteed (CRC errors, packet loss, you name it).
So the link should be taken out of service until it's been (manually)

OK, that's what I assumed.

This patch adds a new controller flag 'NVME_CTRL_TRANSPORT_BLOCKED'
which will be checked during multipath path selection, causing the
path to be skipped.

While this looks sensible to me, it also looks like this introduces a ctrl state outside of ctrl->state... Wouldn't it make sense to move the controller to
NVME_CTRL_DEAD ? or is it not a terminal state?

Actually, I was trying to model it alongside the 'devloss_tmo'/'fast_io_fail' mechanism we have in SCSI. Technically the controller is still present, it's just that we shouldn't
send I/O to it.

Sounds like a dead controller to me.

Sort of, yes.

And I'd rather not disconnect here as we're trying to
do an autoconnect on FC, so manually disconnect would interfere with
that and we probably end in a death spiral doing disconnect/reconnect.

I suggested just transitioning the state to DEAD... Not sure how keep-alives behave though...

Hmm. The state machine has the transition LIVE->DELETING->DEAD,
ie a dead controller is on the way out, with all resources being

A direct transition would pretty much violate that.
If we were going that way I'd prefer to have another state
('IMPACTED' ? 'LIVE_NOIO' ?) with the transitions

We could 'elevate' it to a new controller state, but wasn't sure how big
an appetite there is. And we already have flags like 'stopped' which
seem to fall into the same category.

stopped is different because it is not used to determine if it is capable
for IO (admin or io queues). Hence it is ok to be a flag.


So yeah, we could introduce a new state, but I guess a direct transition
to 'DEAD' is not really a good idea.

How common do you think this state would be? On the one hand, having a generic state that the transport is kept a live but simply refuses to accept I/O sounds like
a generic state, but I can't think of an equivalent in the other transports.

If this is something that is private to FC, perhaps the right way is to add a flag for it that only fc sets, and when a second usage of it appears, we promote it
to a proper controller state. Thoughts?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [SCSI Target Devel]     [Linux SCSI Target Infrastructure]     [Kernel Newbies]     [IDE]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]     [Linux ATA RAID]     [Linux IIO]     [Samba]     [Device Mapper]

  Powered by Linux