RE: [PATCH for-rc 1/5] IB/hfi1: Fix WQ_MEM_RECLAIM warning

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> > [  708.997199] workqueue: WQ_MEM_RECLAIM
> ipoib_wq:ipoib_cm_tx_reap [ib_ipoib] is flushing !WQ_MEM_RECLAIM
> hfi0_0:_hfi1_do_send [hfi1]
> > [  708.997209] WARNING: CPU: 7 PID: 1403 at kernel/workqueue.c:2486
> check_flush_dependency+0xb1/0x100
> > [  709.227743] Call Trace:
> > [  709.230852]  __flush_work.isra.29+0x8c/0x1a0
> > [  709.235779]  ? __switch_to_asm+0x40/0x70
> > [  709.240335]  __cancel_work_timer+0x103/0x190
> > [  709.245253]  ? schedule+0x32/0x80
> > [  709.249216]  iowait_cancel_work+0x15/0x30 [hfi1]
> > [  709.254475]  rvt_reset_qp+0x1f8/0x3e0 [rdmavt]
> > [  709.259554]  rvt_destroy_qp+0x65/0x1f0 [rdmavt]
> > [  709.264703]  ? _cond_resched+0x15/0x30
> > [  709.269081]  ib_destroy_qp+0xe9/0x230 [ib_core]
> > [  709.274223]  ipoib_cm_tx_reap+0x21c/0x560 [ib_ipoib]
> > [  709.279799]  process_one_work+0x171/0x370
> > [  709.284425]  worker_thread+0x49/0x3f0
> > [  709.288695]  kthread+0xf8/0x130
> > [  709.292450]  ? max_active_store+0x80/0x80
> > [  709.297050]  ? kthread_bind+0x10/0x10
> > [  709.301293]  ret_from_fork+0x35/0x40
> > [  709.305441] ---[ end trace f0e973737146499b ]---
> >
> > Since QP destruction frees memory, hfi1_wq should have the
> WQ_MEM_RECLAIM.
> 
> This seems like the same problem as the nvme patches.. Nobody seems to
> know what the rules are for using WQ_MEM_RECLAIM.
> 
> AFAIK it has nothing to do with freeing memory though, that is a new one..
> 
> Are you sure cm_tx_reap shouln'd loose its reclaim flag?
> 

Changing the ipoib workqueue would certainly fix THIS issue.

Here are the uses of WQ_MEM_RECLAIM in drivers/infiniband:
C symbol: WQ_MEM_RECLAIM

  File          Function                        Line
0 cma.c         cma_init                         4687 cma_wq = alloc_ordered_workqueue("rdma_cm", WQ_MEM_RECLAIM);
1 device.c	ib_core_init                     1862 WQ_HIGHPRI | WQ_MEM_RECLAIM | WQ_SYSFS, 0);
2 device.c	ib_core_init                     1870 WQ_UNBOUND | WQ_HIGHPRI | WQ_MEM_RECLAIM |
3 mad.c         ib_mad_port_open                 3214 port_priv->wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
4 multicast.c   mcast_init                        886 mcast_wq = alloc_ordered_workqueue("ib_mcast", WQ_MEM_RECLAIM);
5 sa_query.c    ib_sa_init                       2453 ib_nl_wq = alloc_ordered_workqueue("ib_nl_sa_wq", WQ_MEM_RECLAIM);
6 ucma.c        ucma_open                        1733 WQ_MEM_RECLAIM);
7 iwch_cm.c     iwch_cm_init                     2247 workq = alloc_ordered_workqueue("iw_cxgb3", WQ_MEM_RECLAIM);
8 cm.c          c4iw_cm_init                     4435 workq = alloc_ordered_workqueue("iw_cxgb4", WQ_MEM_RECLAIM);
9 chip.c        init_cntrs                      12698 WQ_MEM_RECLAIM, dd->unit);
a init.c        create_workqueues                 808 WQ_MEM_RECLAIM,
b init.c        create_workqueues                 822 WQ_SYSFS | WQ_MEM_RECLAIM | WQ_UNBOUND,
c opfn.c        opfn_init                         309 WQ_MEM_RECLAIM,
d i40iw_cm.c    i40iw_setup_cm_core              3258 WQ_MEM_RECLAIM);
e i40iw_cm.c    i40iw_setup_cm_core              3261 WQ_MEM_RECLAIM);
f i40iw_main.c  i40iw_open                       1695 iwdev->virtchnl_wq = alloc_ordered_workqueue("iwvch", WQ_MEM_RECLAIM);
g i40iw_main.c  i40iw_open                       1708 iwdev->param_wq = alloc_ordered_workqueue("l2params", WQ_MEM_RECLAIM);
h alias_GUID.c  mlx4_ib_init_alias_guid_service   883 alloc_ordered_workqueue(alias_wq_name, WQ_MEM_RECLAIM);
i mad.c         mlx4_ib_alloc_demux_ctx          2186 ctx->wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
j mad.c         mlx4_ib_alloc_demux_ctx          2194 ctx->ud_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
k main.c        mlx4_ib_init                     3351 wq = alloc_ordered_workqueue("mlx4_ib", WQ_MEM_RECLAIM);
l mcg.c         mlx4_ib_mcg_port_init            1048 ctx->mcg_wq = alloc_ordered_workqueue(name, WQ_MEM_RECLAIM);
m mcg.c         mlx4_ib_mcg_init                 1247 clean_wq = alloc_ordered_workqueue("mlx4_ib_mcg", WQ_MEM_RECLAIM);
n mr.c          mlx5_mr_cache_init                647 cache->wq = alloc_ordered_workqueue("mkey_cache", WQ_MEM_RECLAIM);
o odp.c         mlx5_ib_create_pf_eq             1545 WQ_HIGHPRI | WQ_UNBOUND | WQ_MEM_RECLAIM,
p mthca_catas.c mthca_catas_init                  188 catas_wq = alloc_ordered_workqueue("mthca_catas", WQ_MEM_RECLAIM);
q qib_init.c    qib_create_workqueues             590 WQ_MEM_RECLAIM);
r pvrdma_main.c pvrdma_init                      1168 event_wq = alloc_ordered_workqueue("pvrdma_event_wq", WQ_MEM_RECLAIM);
s ipoib_main.c  ipoib_dev_init                   1756 priv->wq = alloc_ordered_workqueue("ipoib_wq", WQ_MEM_RECLAIM);

The latter is the ipoib wq that conflicts with our non-WQ_MEM_RECLAIM.  This seems excessive and pretty gratuitous.

Tejun, what does "mem reclaim" really mean and when should it be used?

I was assuming that since we are freeing QP kernel memory held by user mode programs that could be oom killed, we need the flag.

Mike



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux