On 03/14/2018 01:27 PM, Michael Christie wrote: > On 03/14/2018 01:24 PM, Maxim Patlasov wrote: >> On Wed, Mar 14, 2018 at 11:13 AM, Jason Dillaman <jdillama@xxxxxxxxxx >> <mailto:jdillama@xxxxxxxxxx>> wrote: >> >> Maxim, can you provide steps for a reproducer? >> >> >> Yes, but it involves adding two artificial delays: one in tcmu-runner >> and another in kernel iscsi. If you're willing to take pains of > > Send the patches for the changes. > >> recompiling kernel and tcmu-runner on one of gateway nodes, I'll help to >> reproduce. >> >> Generally, the idea of reproducer is simple: let's model a situation >> when two stale requests got stuck in kernel mailbox waiting to be >> consumed by tcmu-runner, and another one got stuck in iscsi layer -- >> immediately after reading iscsi request from the socket. If we unblock >> tcmu-runner after newer data went through another gateway, the first >> stale request will switch tcmu-runner state from LOCKED to UNLOCKED>> state, then the second stale request will trigger alua_thread to >> re-acquire the lock, so when the third request comes to tcmu-runner, the Where you send the patches that add your delays could you send the target side /var/log/tcmu-runner.log with log_level = 4. For this test above you should see the second request will be sent to rbd's tcmu_rbd_aio_write function. That command should fail in rbd_finish_aio_generic and tcmu_rbd_handle_blacklisted_cmd will be called. We should then be blocking until IO in that iscsi connection is flushed in tgt_port_grp_recovery_thread_fn. That function will not return from the enable=0 until the iscsi connection is stopped and the commands in it have completed. Other commands you had in flight should eventually hit tcmur_cmd_handler's tcmu_dev_in_recovery check and be failed there or if they had already passed that check then the cmd would be sent to tcmu_rbd_aio_write and they should be getting the blacklisted error like above. >> lock is already reacquired and it goes to OSD smoothly overwriting newer >> data. >> >> >> >> >> On Wed, Mar 14, 2018 at 2:06 PM, Maxim Patlasov >> <mpatlasov@xxxxxxxxxx <mailto:mpatlasov@xxxxxxxxxx>> wrote: >> > On Sun, Mar 11, 2018 at 5:10 PM, Mike Christie >> <mchristi@xxxxxxxxxx <mailto:mchristi@xxxxxxxxxx>> wrote: >> >> >> >> On 03/11/2018 08:54 AM, shadow_lin wrote: >> >> > Hi Jason, >> >> > How the old target gateway is blacklisted? Is it a feature of >> the target >> >> > gateway(which can support active/passive multipath) should >> provide or is >> >> > it only by rbd excusive lock? >> >> > I think excusive lock only let one client can write to rbd at >> the same >> >> > time,but another client can obtain the lock later when the lock is >> >> > released. >> >> >> >> For the case where we had the lock and it got taken: >> >> >> >> If IO was blocked, then unjammed and it has already passed the target >> >> level checks then the IO will be failed by the OSD due to the >> >> blacklisting. When we get IO errors from ceph indicating we are >> >> blacklisted the tcmu rbd layer will fail the IO indicating the state >> >> change and that the IO can be retried. We will also tell the target >> >> layer rbd does not have the lock anymore and to just stop the iscsi >> >> connection while we clean up the blacklisting, running commands and >> >> update our state. >> > >> > >> > Mike, can you please give more details on how you tell the target >> layer rbd >> > does not have the lock and to stop iscsi connection. Which >> > tcmu-runner/kernel-target functions are used for that? >> > >> > In fact, I performed an experiment with three stale write requests >> stuck on >> > blacklisted gateway, and one of them managed to overwrite newer >> data. I >> > followed all instructions from >> > >> http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/ <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli-manual-install/> >> and >> > http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/ >> <http://docs.ceph.com/docs/master/rbd/iscsi-target-cli/>, so I'm >> interested >> > what I'm missing... >> > >> > Thanks, >> > Maxim >> > >> > Thanks, >> > Maxim >> > >> >> >> >> >> > >> >> >> >> -- >> Jason >> >> > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com