Re: [PATCH for-4.19] IB/hfi1: Invalid NUMA node information can cause a divide by zero

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, Aug 15, 2018 at 10:54:49PM -0700, Dennis Dalessandro wrote:
> From: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>
> 
> If the system BIOS does not supply NUMA node information to the
> PCI devices, the NUMA node is selected by choosing the current
> node.
> 
> This can lead to the following crash:
> 
> divide error: 0000 SMP
> CPU: 0 PID: 4 Comm: kworker/0:0 Tainted: G          IOE
> ------------   3.10.0-693.21.1.el7.x86_64 #1
> Hardware name: Intel Corporation S2600KP/S2600KP, BIOS
> SE5C610.86B.01.01.0005.101720141054 10/17/2014
> Workqueue: events work_for_cpu_fn
> task: ffff880174480fd0 ti: ffff880174488000 task.ti: ffff880174488000
> RIP: 0010: [<ffffffffc020ac69>] hfi1_dev_affinity_init+0x129/0x6a0 [hfi1]
> RSP: 0018:ffff88017448bbf8  EFLAGS: 00010246
> RAX: 0000000000000011 RBX: ffff88107ffba6c0 RCX: ffff88085c22e130
> RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff880824ad0000
> RBP: ffff88017448bc48 R08: 0000000000000011 R09: 0000000000000002
> R10: ffff8808582b6ca0 R11: 0000000000003151 R12: ffff8808582b6ca0
> R13: ffff8808582b6518 R14: ffff8808582b6010 R15: 0000000000000012
> FS:  0000000000000000(0000) GS:ffff88085ec00000(0000) knlGS:0000000000000000
> CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> CR2: 00007efc707404f0 CR3: 0000000001a02000 CR4: 00000000001607f0
> DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> Call Trace:
>  hfi1_init_dd+0x14b3/0x27a0 [hfi1]
>  ? pcie_capability_write_word+0x46/0x70
>  ? hfi1_pcie_init+0xc0/0x200 [hfi1]
>  do_init_one+0x153/0x4c0 [hfi1]
>  ? sched_clock_cpu+0x85/0xc0
>  init_one+0x1b5/0x260 [hfi1]
>  local_pci_probe+0x4a/0xb0
>  work_for_cpu_fn+0x1a/0x30
>  process_one_work+0x17f/0x440
>  worker_thread+0x278/0x3c0
>  ? manage_workers.isra.24+0x2a0/0x2a0
>  kthread+0xd1/0xe0
>  ? insert_kthread_work+0x40/0x40
>  ret_from_fork+0x77/0xb0
>  ? insert_kthread_work+0x40/0x40
> 
> If the BIOS is not supplying NUMA information:
>   - set the default table count to 1 for all possible nodes
>   - select node 0 (instead of current NUMA) node to get consistent
>     performance
>   - generate an error indicating that the BIOS should be upgraded
> 
> Reviewed-by: Gary Leshner <gary.s.leshner@xxxxxxxxx>
> Reviewed-by: Mike Marciniszyn <mike.marciniszyn@xxxxxxxxx>
> Signed-off-by: Michael J. Ruhl <michael.j.ruhl@xxxxxxxxx>
> Signed-off-by: Dennis Dalessandro <dennis.dalessandro@xxxxxxxxx>
> ---
>  drivers/infiniband/hw/hfi1/affinity.c |   24 +++++++++++++++++++++---
>  1 files changed, 21 insertions(+), 3 deletions(-)

Applied to for-rc

Thanks,
Jason



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux