Re: iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

Maxim Patlasov <mpatlasov@xxxxxxxxxx> · Thu, 15 Mar 2018 12:32:58 -0700

On Thu, Mar 15, 2018 at 12:48 AM, Mike Christie <mchristi@xxxxxxxxxx> wrote:
...

It looks like there is a bug.

1. A regression was added when I stopped killing the iscsi connection

when the lock is taken away from us to handle a failback bug where it

was causing ping ponging. That combined with #2 will cause the bug.

2. I did not anticipate the type of sleeps above where they are injected

any old place in the kernel. For example, if a command had really got

stuck on the network then the nop timer would fire which forces the

iscsi thread's recv() to fail and that submitting thread to exit. Or we

should handle the delay-request-in-tcmu-runner.diff issue ok, because we

wait for those commands. However, we could just get rescheduled due to

hitting a preemption point and we might not be rescheduled for longer

than failover timeout seconds. For this it could just be some buggy code

that gets run on all the cpus for more than failover timeout seconds

then recovers, and we would hit the bug in your patch above.

The 2 attached patches fix the issues for me on linux. Note that it only

works on linux right now and it only works with 2 nodes. It probably

also works for ESX/windows, but I need to reconfig some timers.

Apply ceph-iscsi-config-explicit-standby.patch to ceph-iscsi-config and

tcmu-runner-use-explicit.patch to tcmu-runner.

Mike, thank you for patches, they seem to work. There is an issue, but not related to data corruption: if the second path (gateway) is not available and I restart tcmu-runner on the first gateway, all subsequent i/o hangs for long because tcmu-runner is in UNLOCKED state and initiator doesn't resend explicit ALUA activation request for long while (190s).

Can you please also clarify how explicit ALUA (with these patches applied) is immune to a situation when there are some stale requests sitting in kernel queues by the moment tcmu-runner handles tcmu_explicit_transition() --> tcmu_acquire_dev_lock(). Does it mean that all requests are strictly ordered and initiator will never send us read/wrtie requests until we complete that explicit ALUA activation request?

Thanks,
Maxim

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com