Re: iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 14, 2018 at 12:05 PM, Michael Christie <mchristi@xxxxxxxxxx> wrote:
On 03/14/2018 01:27 PM, Michael Christie wrote:
> On 03/14/2018 01:24 PM, Maxim Patlasov wrote:
>> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdillama@xxxxxxxxxx
>> <mailto:jdillama@xxxxxxxxxx>> wrote:
>>
>>     Maxim, can you provide steps for a reproducer?
>>
>>
>> Yes, but it involves adding two artificial delays: one in tcmu-runner
>> and another in kernel iscsi. If you're willing to take pains of
>
> Send the patches for the changes.
>
>> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to
>> reproduce.
>>
>> Generally, the idea of reproducer is simple: let's model a situation
>> when two stale requests got stuck in kernel mailbox waiting to be
>> consumed by tcmu-runner, and another one got stuck in iscsi layer --
>> immediately after reading iscsi request from the socket. If we unblock
>> tcmu-runner after newer data went through another gateway, the first
>> stale request will switch tcmu-runner state from LOCKED to UNLOCKED>> state, then the second stale request will trigger alua_thread to
>> re-acquire the lock, so when the third request comes to tcmu-runner, the
Where you send the patches that add your delays could you send the
target side /var/log/tcmu-runner.log with log_level = 4.

For this test above you should see the second request will be sent to
rbd's tcmu_rbd_aio_write function. That command should fail in
rbd_finish_aio_generic and tcmu_rbd_handle_blacklisted_cmd will be
called. We should then be blocking until IO in that iscsi connection is
flushed in tgt_port_grp_recovery_thread_fn. That function will not
return from the enable=0 until the iscsi connection is stopped and the
commands in it have completed.

Other commands you had in flight should eventually hit
tcmur_cmd_handler's tcmu_dev_in_recovery check and be failed there or if
they had already passed that check then the cmd would be sent to
tcmu_rbd_aio_write and they should be getting the blacklisted error like
above.


Mike,

In my scenario the second request is not sent to rbd's tcmu_rbd_aio_write function:

tcmur_cmd_handler -->
  tcmur_alua_implicit_transition -->
    alua_implicit_transition --> // rdev->lock_state == UNLOCKED here
      tcmu_set_sense_data // returns SAM_STAT_CHECK_CONDITION

Hence tcmur_cmd_handler goes to "untrack:". I'll send /var/log/tcmu-runner.log and delay patches an hour later.

Thanks,
Maxim
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux