Hi Martin, Below are my replies. >If there was any discussion, I haven't been involved :-) >I haven't looked into FPIN much so far. I'm rather sceptic with it's usefulness for dm-multipath. Being a property of FC-2, FPIN works at least 2 layers below dm-multipath. dm-multipath is agnostic against protocol and transport properties by design. User space multipathd can cross these layers and tune dm-multipath based on lower-level properties, but such actions have rather large latencies. >As you know, dm-multipath has 3 switches for routing IO via different paths: > 1 priority groups, > 2 path status (good / failed) >3 path selector algorithm >1) and 2) are controlled by user space, and have high latency. >The current "marginal" concept in multipathd watches paths for repeated failures, and configures the kernel to avoid using paths that are considered marginal, using methods 1) and 2). This is a very-high- latency algorithm that >changes state on the time scale of minutes. >There is no concept for "delaying" or "pausing" IO on paths on short time scale. >The only low-latency mechanism is 3). But it's block level, no existing selector looks at transport-level properties. >That said, I can quite well imagine a feedback mechanism based on throttling or delays applied in the FC drivers. For example, it a remote port was throttled by the driver in response to FPIN messages, it's bandwidth would >decrease, and a path selector like "service-time" >would automatically assign less IO to such paths. This wouldn't need any changes in dm-multipath or multipath-tools, it would work entirely on the FC level. [Muneendra]Agreed. >Talking about improving the current "marginal" algorithm in multipathd, and knowing that it's slow, FPIN might provide additional data that would be good to have. Currently, multipathd only has 2 inputs, "good<->bad" state >transitions based either on kernel I/O errors or path checker results, and failure statistics from multipathd's internal "io_err_stat" thread, which only reads sector 0. This could obviously be improved, but there may actually be >lower-hanging fruit than evaluating FPIN notifications (for example, I've pondered utilizing the kernel's blktrace functionality to detect unusually long IO latencies or bandwidth drops). >Talking about FPIN, is it planned to notify user space about such fabric events, and if yes, how? [Muneendra]Yes. FC drivers, when receiving FC FPIN ELS's are calling a scsi transport routine with the FPIN payload. The transport is pushing this as an "event" via netlink. An app bound to the local address used by the scsi transport can receive the event and parse it. Benjamin has added a marginal_path group(multipath marginal pathgroups) in the dm-multipath. https://patchwork.kernel.org/project/dm-devel/cover/1564763622-31752-1-git -send-email-bmarzins@xxxxxxxxxx/ One of the intention of the Benjamin's patch (support for maginal path) is to support for the FPIN events we receive from fabric. On receiving the fpin-li our intention was to place all the paths that are affected into the marginal path group. Below are the 4 types of descriptors returned in an FPIN: • Link Integrity (LN): some error on a link that affected frames, which is the main one for "flaky path" • Delivery Notification (DN): something explicitly knew about a dropped frame and is reporting it. Usually, things like a CRC error says you can't trust the frame header, so you it's a LI error. But if you do have a valid frame, but drop it, such as a fabric edge timer (don't queue it more the 250-600ms), then it becomes a DN type. Could be flaky path, but not necessarily. • Congestion (CN): fabric is saying it's congested sending to "your" port. Meaning if a host receives it - fabric is saying it has more frames for the host than it's pulling in so it's backing up the fabric.What should happen is load by the host should be lowered - but it's across all targets. Not all targets are perhaps in the mpio path list • Peer Congestion (PCN): this goes along with CN in that the fabric is now telling the other devices in the zone sending traffic to that congested port that the other port is backing up. So the idea is these peer send less load to the congested port. Shouldn't really tie to mpio. some of the current thinking is targets could see this and reduce their transmission rate to a host to the link speed of the host On receiving the congestion notifications our intention is to slowdown the work load gradually from the host until it stops receiving the congestion notifications. We need to validate the same how we can achieve the same of decreasing the workloads with the help of dm-multipath. As Hannes mentioned in his earlier mail our primary goal is that the admin first should be _alerted_, having FPINs showing up in the message log, to alert the admin that his fabric is not performing well. Regards, Muneendra. -- This electronic communication and the information and any files transmitted with it, or attached to it, are confidential and are intended solely for the use of the individual or entity to whom it is addressed and may contain information that is confidential, legally privileged, protected by privacy laws, or otherwise restricted from disclosure to anyone else. If you are not the intended recipient or the person responsible for delivering the e-mail to the intended recipient, you are hereby notified that any use, copying, distributing, dissemination, forwarding, printing, or copying of this e-mail is strictly prohibited. If you received this e-mail in error, please return the e-mail to the sender, delete it from your computer, and destroy any printed copy of it.
Attachment:
smime.p7s
Description: S/MIME Cryptographic Signature
-- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel