On Wed, Feb 16, 2022 at 12:31 AM Mike Christie <michael.christie@xxxxxxxxxx> wrote: > > On 2/15/22 9:49 AM, Zhengyuan Liu wrote: > > Hi, all > > > > We have an online server which uses multipath + iscsi to attach storage > > from Storage Server. There are two NICs on the server and for each it > > carries about 20 iscsi sessions and for each session it includes about 50 > > iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block devices > > on the server). The problem is: once a NIC gets faulted, it will take too long > > (nearly 80s) for multipath to switch to another good NIC link, because it > > needs to block all iscsi devices over that faulted NIC firstly. The callstack is > > shown below: > > > > void iscsi_block_session(struct iscsi_cls_session *session) > > { > > queue_work(iscsi_eh_timer_workq, &session->block_work); > > } > > > > __iscsi_block_session() -> scsi_target_block() -> target_block() -> > > device_block() -> scsi_internal_device_block() -> scsi_stop_queue() -> > > blk_mq_quiesce_queue()>synchronize_rcu() > > > > For all sessions and all devices, it was processed sequentially, and we have > > traced that for each synchronize_rcu() call it takes about 80ms, so > > the total cost > > is about 80s (80ms * 20 * 50). It's so long that the application can't > > tolerate and > > may interrupt service. > > > > So my question is that can we optimize the procedure to reduce the time cost on > > blocking all iscsi devices? I'm not sure if it is a good idea to increase the > > workqueue's max_active of iscsi_eh_timer_workq to improve concurrency. > > We need a patch, so the unblock call waits/cancels/flushes the block call or > they could be running in parallel. > > I'll send a patchset later today so you can test it. I'm glad to test once you push the patchset. Thank you, Mike. -- dm-devel mailing list dm-devel@xxxxxxxxxx https://listman.redhat.com/mailman/listinfo/dm-devel