Re: blk-mq: improvement CPU hotplug (simplified version) v3

Ming Lei <ming.lei@xxxxxxxxxx> · Tue, 26 May 2020 08:37:08 +0800

On Mon, May 25, 2020 at 08:32:44AM -0700, Bart Van Assche wrote:
> On 2020-05-24 21:09, Ming Lei wrote:
> > On Sat, May 23, 2020 at 08:19:58AM -0700, Bart Van Assche wrote:
> >> On 2020-05-21 19:39, Ming Lei wrote:
> >>> You may argue that two hw queue may share single managed interrupt, that
> >>> is possible if driver plays the trick. But if driver plays the trick in
> >>> this way, it is driver's responsibility to guarantee that the managed
> >>> irq won't be shutdown if either of the two hctxs are active, such as,
> >>> making sure that hctx->cpumask + hctx->cpumask <= this managed interrupt's affinity.
> >>> It is definitely one strange enough case, and this patch doesn't
> >>> suppose to cover this strange case. But, this patch won't break this
> >>> case. Also just be curious, do you have such in-tree case? and are you
> >>> sure the driver uses managed interrupt?
> >>
> >> I'm concerned about the block drivers that use RDMA (NVMeOF, SRP, iSER,
> >> ...). The functions that accept an interrupt vector argument
> >> (comp_vector), namely ib_alloc_cq() and ib_create_cq(), can be used in
> > 
> > PCI_IRQ_AFFINITY isn't used in RDMA driver, so RDMA NIC uses non-managed
> > irq.
> 
> Something doesn't add up ...
> 
> On a system with eight CPU cores and a ConnectX-3 RDMA adapter (mlx4
> driver; uses request_irq()) I ran the following test:
> * Query the affinity of all mlx4 edge interrupts (mlx4-1@0000:01:00.0 ..
> mlx4-16@0000:01:00.0).
> * Offline CPUs 6 and 7.
> * Query CPU affinity again.
> 
> As one can see below the affinity of the mlx4 interrupts was modified.
> Does this mean that the interrupt core manages more than only interrupts
> registered with PCI_IRQ_AFFINITY?
> 
> All CPU's online:
> 
> 55:04
> 56:80
> 57:40
> 58:40
> 59:20
> 60:10
> 61:08
> 62:02
> 63:02
> 64:01
> 65:20
> 66:20
> 67:10
> 68:10
> 69:40
> 70:08
> 
> CPUs 6 and 7 offline:
> 
> 55:04
> 56:02
> 57:08
> 58:02
> 59:20
> 60:10
> 61:08
> 62:02
> 63:02
> 64:01
> 65:20
> 66:20
> 67:10
> 68:10
> 69:04
> 70:08

It is non-managed interrupt, and their affinity will be changed during
cpu online/offline by irq migration code, I believe I have shared the
function to you before.

The issue to be addressed is for managed interrupt only, which is shutdown
during cpu offline, that is why we have to make sure that there isn't
any in-flight io request. As Keith mentioned, their affinity is assigned
during creation, and won't be changed since its creation.

The suggested approach fixes the issue for managed interrupt, meantime
it is harmless for non-managed interrupt.

Thanks,
Ming