Re: iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdillama@xxxxxxxxxx> wrote:
Maxim, can you provide steps for a reproducer?

Yes, but it involves adding two artificial delays: one in tcmu-runner and another in kernel iscsi. If you're willing to take pains of recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to reproduce.

Generally, the idea of reproducer is simple: let's model a situation when two stale requests got stuck in kernel mailbox waiting to be consumed by tcmu-runner, and another one got stuck in iscsi layer -- immediately after reading iscsi request from the socket. If we unblock tcmu-runner after newer data went through another gateway, the first stale request will switch tcmu-runner state from LOCKED to UNLOCKED state, then the second stale request will trigger alua_thread to re-acquire the lock, so when the third request comes to tcmu-runner, the lock is already reacquired and it goes to OSD smoothly overwriting newer data.

 

On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov <mpatlasov@xxxxxxxxxx> wrote:
> On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie <mchristi@xxxxxxxxxx> wrote:
>>
>> On 03/11/2018 08:54 AM, shadow_lin wrote:
>> > Hi Jason,
>> > How the old target gateway is blacklisted? Is it a feature of the target
>> > gateway(which can support active/passive multipath) should provide or is
>> > it only by rbd excusive lock?
>> > I think excusive lock only let one client can write to rbd at the same
>> > time,but another client can obtain the lock later when the lock is
>> > released.
>>
>> For the case where we had the lock and it got taken:
>>
>> If IO was blocked, then unjammed and it has already passed the target
>> level checks then the IO will be failed by the OSD due to the
>> blacklisting. When we get IO errors from ceph indicating we are
>> blacklisted the tcmu rbd layer will fail the IO indicating the state
>> change and that the IO can be retried. We will also tell the target
>> layer rbd does not have the lock anymore and to just stop the iscsi
>> connection while we clean up the blacklisting, running commands and
>> update our state.
>
>
> Mike, can you please give more details on how you tell the target layer rbd
> does not have the lock and to stop iscsi connection. Which
> tcmu-runner/kernel-target functions are used for that?
>
> In fact, I performed an experiment with three stale write requests stuck on
> blacklisted gateway, and one of them managed to overwrite newer data. I
> followed all instructions from
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ and
> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/, so I'm interested
> what I'm missing...
>
> Thanks,
> Maxim
>
> Thanks,
> Maxim
>
>>
>>
>



--
Jason

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux