Re: [PATCH V5 0/6] blk-mq: improvement CPU hotplug

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 03/02/2020 15:43, Marc Zyngier wrote:
On 2020-02-03 12:56, John Garry wrote:

[...]

Can you trigger it after disabling irqbalance?

No, so tested by killing the irqbalance process and it ran for 25
minutes without issue.

OK, that's interesting.

Can you find you whether irqbalance tries to move an interrupt to an offlined CPU?
Just putting a trace into git_set_affinity() should be enough.


Hi Marc,

I should have mentioned this already, but this board is the same D06 which I reported had the CPU0 hotplug issue from broken FW, if you remember:

https://lore.kernel.org/linux-arm-kernel/fd70f499-83b4-2fdd-d043-ea9ab8f2c636@xxxxxxxxxx/

That's why I'm not including CPU0 in my hotplug testing. Other CPUs were fine. And it doesn't look like that issue.

Apart from that, I tried as you suggested, with this change:

+++ b/drivers/irqchip/irq-gic-v3.c
@@ -1143,6 +1143,9 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,
 	int enabled;
 	u64 val;

+ pr_err("%s irq%d mask_val=%*pbl cpu_online_mask=%*pbl force=%d cpumask_any_and=%d\n", __func__, d->irq, cpumask_pr_args(mask_val), cpumask_pr_args(cpu_online_mask), force, pumask_any_and(cpu_online_mask, mask_val));

 	if (force)
 		cpu = cpumask_first(mask_val);
 	else
@@ -1176,6 +1179,9 @@ static int gic_set_affinity(struct irq_data *d, const struct cpumask *mask_val,

 	irq_data_update_effective_affinity(d, cpumask_of(cpu));

+ pr_err("%s1 irq%d mask_val=%*pbl cpu_online_mask=%*pbl force=%d cpumask_any_and=%d cpu=%d\n", __func__, d->irq, cpumask_pr_args(mask_val), cpumask_pr_args(cpu_online_mask), force, cpumask_any_and(cpu_online_mask, mask_val), cpu);
+
 	return IRQ_SET_MASK_OK_DONE;
 }

And see this:

polled 0 ms)
[ 947.551340] GICv3: gic_set_affinity irq5 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 [ 947.560986] GICv3: gic_set_affinity1 irq5 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 cpu=44 [ 947.571321] GICv3: gic_set_affinity irq8 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 [ 947.580963] GICv3: gic_set_affinity1 irq8 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 cpu=44
[  947.591581] IRQ 819: no longer affine to CPU43
[  947.596149] CPU43: shutdown
[  947.598945] psci: CPU43 killed (polled 0 ms)
[ 947.607029] GICv3: gic_set_affinity irq5 mask_val=29-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 Jobs: 6 (f=6): [R(6)][0.0%][r=0KiB/[ 968.614971] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks: [ 968.621062] rcu: 66-...0: (0 ticks this GP) idle=d9a/1/0x4000000000000000 softirq=3654/3654 fqs=2625
[  968.630365] (detected by 69, t=5256 jiffies, g=51577, q=1884)
[  968.636187] Task dump for CPU 66:
[ 968.639490] irqbalance R running task 0 1577 1 0x00000002
[  968.646527] Call trace:
[  968.648970]  __switch_to+0xbc/0x218
[  968.652450]  irq_do_set_affinity+0x30/0xd0
[  968.656534]  irq_set_affinity_locked+0xc8/0xf0
[  968.660965]  __irq_set_affinity+0x4c/0x80
[  968.664963]  write_irq_affinity.isra.7+0x104/0x120
[  968.669741]  irq_affinity_proc_write+0x1c/0x28
[  968.674175]  proc_reg_write+0x78/0xb8
[  968.677827]  __vfs_write+0x18/0x38
[  968.681217]  vfs_write+0xb4/0x1e0
[  968.684519]  ksys_write+0x68/0xf8
[  968.687822]  __arm64_sys_write+0x18/0x20
[  968.691733]  el0_svc_common.constprop.2+0x64/0x160
[  968.696511]  el0_svc_handler+0x20/0x80
[  968.700247]  el0_sync_handler+0xe4/0x188
[  968.704157]  el0_sync+0x140/0x180

and this on a 2nd run:

[  215.468476] CPU42: shutdown
[  215.471272] psci: CPU42 killed (polled 0 ms)
[ 215.723714] GICv3: gic_set_affinity irq5 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 [ 215.733360] GICv3: gic_set_affinity1 irq5 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 cpu=44 [ 215.743696] GICv3: gic_set_affinity irq8 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 [ 215.753338] GICv3: gic_set_affinity1 irq8 mask_val=24-47 cpu_online_mask=0,44-95 force=0 cpumask_any_and=44 cpu=44
[  215.763835] IRQ 426: no longer affine to CPU43
[  215.768412] IRQ 819: no longer affine to CPU43
[  215.773023] CPU43: shutdown
Jobs: 6 (f=6): [R(6)][76.9%][r=13[ 215.775834] psci: CPU43 killed (polled 0 ms) [ 216.604779] GICv3: gic_set_affinity irq10 mask_val=53 cpu_online_mask=0,44-95 force=0 cpumask_any_and=53 [ 217.223461] pcieport 0000:00:08.0: can't change power state from D3cold to D0 (config space inaccessible) [ 237.615383] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:2d:17h:39m:00s] [ 237.621469] rcu: 58-...0: (1 GPs behind) idle=b5e/1/0x4000000000000000 softirq=1908/1908 fqs=2626
[  237.630525] (detected by 44, t=5254 jiffies, g=12137, q=191)
[  237.636260] Task dump for CPU 58:
[ 237.639563] irqbalance R running task 0 1567 1 0x00000002
[  237.646599] Call trace:
[  237.649037]  __switch_to+0xbc/0x218
[  237.652513]  0xffff80001529bd68
[ 239.283412] nvme nvme1: controller is down; will reset: CSTS=0xffffffff, PCI_STATUS=0xffff [ 300.635382] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:03d:01h:02m:56s] [ 300.641466] rcu: 58-...0: (1 GPs behind) idle=b5e/1/0x4000000000000000 softirq=1908/1908 fqs=10503
[  300.650589] (detected by 44, t=21010 jiffies, g=12137, q=698)
[  300.656410] Task dump for CPU 58:
[ 300.659712] irqbalance R running task 0 1567 1 0x00000002
[  300.666747] Call trace:

Info about irq5 and irq10 after booting:

john@ubuntu:~$ ls /proc/irq/5
affinity_hint       effective_affinity_list  smp_affinity       spurious
effective_affinity  node                     smp_affinity_list  uart-pl011
john@ubuntu:~$ more /proc/irq/5/smp_affinity_list
24-47
john@ubuntu:~$ ls /proc/irq/10
affinity_hint            ehci_hcd:usb1  ohci_hcd:usb3  smp_affinity_list
effective_affinity       ehci_hcd:usb2  ohci_hcd:usb4  spurious
effective_affinity_list  node           smp_affinity
john@ubuntu:~$ more /proc/irq/10/smp_affinity_list
71

Thanks,
John



[Index of Archives]     [Linux RAID]     [Linux SCSI]     [Linux ATA RAID]     [IDE]     [Linux Wireless]     [Linux Kernel]     [ATH6KL]     [Linux Bluetooth]     [Linux Netdev]     [Kernel Newbies]     [Security]     [Git]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Device Mapper]

  Powered by Linux