On 03/14/2018 01:24 PM, Maxim Patlasov wrote: > On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdillama@xxxxxxxxxx > <mailto:jdillama@xxxxxxxxxx>> wrote: > > Maxim, can you provide steps for a reproducer? > > > Yes, but it involves adding two artificial delays: one in tcmu-runner > and another in kernel iscsi. If you're willing to take pains of Send the patches for the changes. > recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to > reproduce. > > Generally, the idea of reproducer is simple: let's model a situation > when two stale requests got stuck in kernel mailbox waiting to be > consumed by tcmu-runner, and another one got stuck in iscsi layer -- > immediately after reading iscsi request from the socket. If we unblock > tcmu-runner after newer data went through another gateway, the first > stale request will switch tcmu-runner state from LOCKED to UNLOCKED > state, then the second stale request will trigger alua_thread to > re-acquire the lock, so when the third request comes to tcmu-runner, the > lock is already reacquired and it goes to OSD smoothly overwriting newer > data. > > > > > On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov > <mpatlasov@xxxxxxxxxx <mailto:mpatlasov@xxxxxxxxxx>> wrote: > > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie > <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>> wrote: > >> > >> On 03/11/2018 08:54 AM, shadow_lin wrote: > >> > Hi Jason, > >> > How the old target gateway is blacklisted? Is it a feature of > the target > >> > gateway(which can support active/passive multipath) should > provide or is > >> > it only by rbd excusive lock? > >> > I think excusive lock only let one client can write to rbd at > the same > >> > time,but another client can obtain the lock later when the lock is > >> > released. > >> > >> For the case where we had the lock and it got taken: > >> > >> If IO was blocked, then unjammed and it has already passed the target > >> level checks then the IO will be failed by the OSD due to the > >> blacklisting. When we get IO errors from ceph indicating we are > >> blacklisted the tcmu rbd layer will fail the IO indicating the state > >> change and that the IO can be retried. We will also tell the target > >> layer rbd does not have the lock anymore and to just stop the iscsi > >> connection while we clean up the blacklisting, running commands and > >> update our state. > > > > > > Mike, can you please give more details on how you tell the target > layer rbd > > does not have the lock and to stop iscsi connection. Which > > tcmu-runner/kernel-target functions are used for that? > > > > In fact, I performed an experiment with three stale write requests > stuck on > > blacklisted gateway, and one of them managed to overwrite newer > data. I > > followed all instructions from > > > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/> > and > > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/ > <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/>, so I'm > interested > > what I'm missing... > > > > Thanks, > > Maxim > > > > Thanks, > > Maxim > > > >> > >> > > > > > > -- > Jason > > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com