Re: iSCSI Multipath (Load Balancing) vs RBD Exclusive Lock

Michael Christie <mchristi@xxxxxxxxxx> · Wed, 14 Mar 2018 13:27:21 -0500

On 03/14/2018 01:24 PM, Maxim Patlasov wrote:
> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdillama@xxxxxxxxxx
> <mailto:jdillama@xxxxxxxxxx>> wrote:
> 
>     Maxim, can you provide steps for a reproducer?
> 
> 
> Yes, but it involves adding two artificial delays: one in tcmu-runner
> and another in kernel iscsi. If you're willing to take pains of

Send the patches for the changes.

> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to
> reproduce.
> 
> Generally, the idea of reproducer is simple: let's model a situation
> when two stale requests got stuck in kernel mailbox waiting to be
> consumed by tcmu-runner, and another one got stuck in iscsi layer --
> immediately after reading iscsi request from the socket. If we unblock
> tcmu-runner after newer data went through another gateway, the first
> stale request will switch tcmu-runner state from LOCKED to UNLOCKED
> state, then the second stale request will trigger alua_thread to
> re-acquire the lock, so when the third request comes to tcmu-runner, the
> lock is already reacquired and it goes to OSD smoothly overwriting newer
> data.
> 
>  
> 
> 
>     On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov
>     <mpatlasov@xxxxxxxxxx <mailto:mpatlasov@xxxxxxxxxx>> wrote:
>     > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie
>     <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>> wrote:
>     >>
>     >> On 03/11/2018 08:54 AM, shadow_lin wrote:
>     >> > Hi Jason,
>     >> > How the old target gateway is blacklisted? Is it a feature of
>     the target
>     >> > gateway(which can support active/passive multipath) should
>     provide or is
>     >> > it only by rbd excusive lock?
>     >> > I think excusive lock only let one client can write to rbd at
>     the same
>     >> > time,but another client can obtain the lock later when the lock is
>     >> > released.
>     >>
>     >> For the case where we had the lock and it got taken:
>     >>
>     >> If IO was blocked, then unjammed and it has already passed the target
>     >> level checks then the IO will be failed by the OSD due to the
>     >> blacklisting. When we get IO errors from ceph indicating we are
>     >> blacklisted the tcmu rbd layer will fail the IO indicating the state
>     >> change and that the IO can be retried. We will also tell the target
>     >> layer rbd does not have the lock anymore and to just stop the iscsi
>     >> connection while we clean up the blacklisting, running commands and
>     >> update our state.
>     >
>     >
>     > Mike, can you please give more details on how you tell the target
>     layer rbd
>     > does not have the lock and to stop iscsi connection. Which
>     > tcmu-runner/kernel-target functions are used for that?
>     >
>     > In fact, I performed an experiment with three stale write requests
>     stuck on
>     > blacklisted gateway, and one of them managed to overwrite newer
>     data. I
>     > followed all instructions from
>     >
>     http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/>
>     and
>     > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/
>     <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/>, so I'm
>     interested
>     > what I'm missing...
>     >
>     > Thanks,
>     > Maxim
>     >
>     > Thanks,
>     > Maxim
>     >
>     >>
>     >>
>     >
> 
> 
> 
>     --
>     Jason
> 
> 

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com