> On 6 Apr 2017, at 08:42, Nick Fisk <nick@xxxxxxxxxx> wrote: > > I assume Brady is referring to the death spiral LIO gets into with some initiators, including vmware, if an IO takes longer than about 10s. We have occasionally seen this issue with vmware+LIO, almost always when upgrading OSD nodes. Didn’t realise it was a known issue! Apart from that, though, we've found LIO generally to be far more performant and stable (especially in our multipathing setup) so would like to stick with it if possible. I’m wondering, are there any additional steps we should be taking to minimise the risk of LIO timeouts during upgrades? At the moment, we set the cluster to “noout”, stop the node’s services, upgrade the packages and reboot. For instance, is there a way to drain connections from clients to a particular node before shutting down its OSDs? Thanks, Oliver. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com