>> 2. In the block layer add callouts/cmds so that we can abort >> requests/bios at the LLD level. >> >> 3. For rbd, we will implement support for #2. In ceph then we would need >> to add code to be able to track down commands and kill them if we can or >> at least figure out what is going on and log a message so we do not have >> these mysterious hung commands. > > We just had a short network disruption, likely simply leaf/spine > overload, which temporarily hung up RBD<->LIO traffic. ESXi<->LIO > traffic stayed up. RBD seems to allow for long IO waits, i.e. you > could wait 30+ seconds for RBD IO to complete, but ESXi goes into a > death spiral after 5 seconds. So if there were an option on either > LIO or RBD side to just fail an IO that did not complete within say 4 > seconds, this would take care of the nasty consequences on ESXi side. > > Can RBD IO be aborted after a given number of seconds? > > ESXi will then retry the IO and if the problem was transient, that IO > will complete and life goes on. Thanks to Mike Christie's excellent analysis, a new issue has been identified that will prevent at least some of the ESXi/LIO/Ceph issues. A number of these implementations use clustering, i.e. Pacemaker, same as what we do. Upon failover, the logic is to start the target(s) then open these up to initiators then start the LUNs. However, apparently ESXi will scan the targets on failover, discover that they have no LUNs (in the brief period between target and LUN start) and will not rescan the target any more. So what has to happen is either not enable the target or block the ports on failover until all LUNs complete their startup. We will implement this behavior shortly and advise on test results. Another test I am planning to perform in lab is to just disconnect the Ceph public network from an LIO node, but leave the iSCSI network connected to ESXi. This should cause timeouts, then a failover to another node and a rescan. Ideally, the RBD device will abort IOs in progress so ESXi knows they are not going to complete and does not wait. Regards, Alex -- To unsubscribe from this list: send the line "unsubscribe target-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html