Question about iscsi session block

Zhengyuan Liu <liuzhengyuang521@xxxxxxxxx> · Tue, 15 Feb 2022 23:49:05 +0800

Hi, all

We have an online server which uses multipath + iscsi to attach storage
from Storage Server. There are two NICs on the server and for each it
carries about 20 iscsi sessions and for each session it includes about 50
 iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block devices
 on the server). The problem is: once a NIC gets faulted, it will take too long
(nearly 80s) for multipath to switch to another good NIC link, because it
needs to block all iscsi devices over that faulted NIC firstly. The callstack is
 shown below:

    void iscsi_block_session(struct iscsi_cls_session *session)
    {
        queue_work(iscsi_eh_timer_workq, &session->block_work);
    }

 __iscsi_block_session() -> scsi_target_block() -> target_block() ->
  device_block() ->  scsi_internal_device_block() -> scsi_stop_queue() ->
 blk_mq_quiesce_queue()>synchronize_rcu()

For all sessions and all devices, it was processed sequentially, and we have
traced that for each synchronize_rcu() call it takes about 80ms, so
the total cost
is about 80s (80ms * 20 * 50). It's so long that the application can't
tolerate and
may interrupt service.

So my question is that can we optimize the procedure to reduce the time cost on
blocking all iscsi devices?  I'm not sure if it is a good idea to increase the
workqueue's max_active of iscsi_eh_timer_workq to improve concurrency.

Thanks in advance.

--
dm-devel mailing list
dm-devel@xxxxxxxxxx
https://listman.redhat.com/mailman/listinfo/dm-devel