Hi, Mike, Sorry for the delayed reply since I have no environment to check your bellow patcheset untile recently https://lore.kernel.org/all/20220226230435.38733-1-michael.christie@xxxxxxxxxx/ After applied those series, the total time has dropped from 80s to nearly 10s, it's a great improvement. Thanks, again On Sun, Feb 27, 2022 at 7:00 AM Mike Christie <michael.christie@xxxxxxxxxx> wrote: > > On 2/15/22 8:19 PM, michael.christie@xxxxxxxxxx wrote: > > On 2/15/22 7:28 PM, Zhengyuan Liu wrote: > >> On Wed, Feb 16, 2022 at 12:31 AM Mike Christie > >> <michael.christie@xxxxxxxxxx> wrote: > >>> > >>> On 2/15/22 9:49 AM, Zhengyuan Liu wrote: > >>>> Hi, all > >>>> > >>>> We have an online server which uses multipath + iscsi to attach storage > >>>> from Storage Server. There are two NICs on the server and for each it > >>>> carries about 20 iscsi sessions and for each session it includes about 50 > >>>> iscsi devices (yes, there are totally about 2*20*50=2000 iscsi block devices > >>>> on the server). The problem is: once a NIC gets faulted, it will take too long > >>>> (nearly 80s) for multipath to switch to another good NIC link, because it > >>>> needs to block all iscsi devices over that faulted NIC firstly. The callstack is > >>>> shown below: > >>>> > >>>> void iscsi_block_session(struct iscsi_cls_session *session) > >>>> { > >>>> queue_work(iscsi_eh_timer_workq, &session->block_work); > >>>> } > >>>> > >>>> __iscsi_block_session() -> scsi_target_block() -> target_block() -> > >>>> device_block() -> scsi_internal_device_block() -> scsi_stop_queue() -> > >>>> blk_mq_quiesce_queue()>synchronize_rcu() > >>>> > >>>> For all sessions and all devices, it was processed sequentially, and we have > >>>> traced that for each synchronize_rcu() call it takes about 80ms, so > >>>> the total cost > >>>> is about 80s (80ms * 20 * 50). It's so long that the application can't > >>>> tolerate and > >>>> may interrupt service. > >>>> > >>>> So my question is that can we optimize the procedure to reduce the time cost on > >>>> blocking all iscsi devices? I'm not sure if it is a good idea to increase the > >>>> workqueue's max_active of iscsi_eh_timer_workq to improve concurrency. > >>> > >>> We need a patch, so the unblock call waits/cancels/flushes the block call or > >>> they could be running in parallel. > >>> > >>> I'll send a patchset later today so you can test it. > >> > >> I'm glad to test once you push the patchset. > >> > >> Thank you, Mike. > > > > I forgot I did this recently :) > > > > commit 7ce9fc5ecde0d8bd64c29baee6c5e3ce7074ec9a > > Author: Mike Christie <michael.christie@xxxxxxxxxx> > > Date: Tue May 25 13:18:09 2021 -0500 > > > > scsi: iscsi: Flush block work before unblock > > > > We set the max_active iSCSI EH works to 1, so all work is going to execute > > in order by default. However, userspace can now override this in sysfs. If > > max_active > 1, we can end up with the block_work on CPU1 and > > iscsi_unblock_session running the unblock_work on CPU2 and the session and > > target/device state will end up out of sync with each other. > > > > This adds a flush of the block_work in iscsi_unblock_session. > > > > > > It was merged in 5.14. > > Hey, I found one more bug when max_active > 1. While fixing it I decided to just > fix this so we can do the sessions recoveries in parallel and the user doesn't have > to worry about setting max_active. > > I'll send a patchset and cc you.