Re: Hang in ipoib_mcast_stop_thread

Erez Shitrit <erezsh@xxxxxxxxxxxxxxxxxx> · Mon, 6 Jun 2016 14:51:37 +0300

Hi Nikolay

What was the scenario? (Do you have a way to reproduce that?)
Which IB card are you using?

Thanks, Erez

On Mon, Jun 6, 2016 at 12:09 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote:
> Hello,
>
> [Tejun, I have cc'ed you since I'm not familiar with the internals of
> the workqueue code and cannot comment whether the the mcast_task was
> queued or and whether there were any tasks in the workqueue based on the
> structures I've presented below, could you please comment on that?]
>
> I've been running an infiniband network on 4.4.10 and recently due to
> the infiniband failing (I presume) and having the following in my logs:
>
> [128340.462147] ib0: transmit timeout: latency 45014865 msecs
> [128340.462151] ib0: queue stopped 1, tx_head 711, tx_tail 583
> [128341.461122] ib0: transmit timeout: latency 45015864 msecs
> [128341.461126] ib0: queue stopped 1, tx_head 711, tx_tail 583
> [128342.461053] ib0: transmit timeout: latency 45016864 msecs
> [128342.461056] ib0: queue stopped 1, tx_head 711, tx_tail 583
>
> I had various commands hang due to rtnl_lock being held by an ip
> command, trying to shutdown the infiniband interface:
>
> PID: 30572 TASK: ffff881bb6bf6e00 CPU: 15 COMMAND: "ip"
> #0 [ffff88027f50b390] __schedule at ffffffff81631421
> #1 [ffff88027f50b430] schedule at ffffffff81631cc7
> #2 [ffff88027f50b450] schedule_timeout at ffffffff81634c89
> #3 [ffff88027f50b4e0] wait_for_completion at ffffffff81632d73
> #4 [ffff88027f50b550] flush_workqueue at ffffffff8106c761
> #5 [ffff88027f50b620] ipoib_mcast_stop_thread at ffffffffa02e0bf2 [ib_ipoib]
> #6 [ffff88027f50b650] ipoib_ib_dev_down at ffffffffa02de7d6 [ib_ipoib]
> #7 [ffff88027f50b670] ipoib_stop at ffffffffa02dc208 [ib_ipoib]
> #8 [ffff88027f50b6a0] __dev_close_many at ffffffff81562692
> #9 [ffff88027f50b6c0] __dev_close at ffffffff81562716
> #10 [ffff88027f50b6f0] __dev_change_flags at ffffffff815630fc
> #11 [ffff88027f50b730] dev_change_flags at ffffffff81563207
> #12 [ffff88027f50b760] do_setlink at ffffffff81576f5f
> #13 [ffff88027f50b860] rtnl_newlink at ffffffff81578afb
> #14 [ffff88027f50bb10] rtnetlink_rcv_msg at ffffffff81577ae5
> #15 [ffff88027f50bb90] netlink_rcv_skb at ffffffff8159a373
> #16 [ffff88027f50bbc0] rtnetlink_rcv at ffffffff81577bb5
> #17 [ffff88027f50bbe0] netlink_unicast at ffffffff81599e98
> #18 [ffff88027f50bc40] netlink_sendmsg at ffffffff8159aabe
> #19 [ffff88027f50bd00] sock_sendmsg at ffffffff81548ce9
> #20 [ffff88027f50bd20] ___sys_sendmsg at ffffffff8154ae78
> #21 [ffff88027f50beb0] __sys_sendmsg at ffffffff8154b059
> #22 [ffff88027f50bf40] sys_sendmsg at ffffffff8154b0a9
> #23 [ffff88027f50bf50] entry_SYSCALL_64_fastpath at ffffffff81635c57
>
>
> So clearly ipoib_mcast_stop_thread has hung on trying to stop the
> multicast thread. Here is the state of the ipoib_wq:
>
> struct workqueue_struct {
>   pwqs = {
>     next = 0xffff883ff1196770,
>     prev = 0xffff883ff1196770
>   },
>   list = {
>     next = 0xffff883fef5bea10,
>     prev = 0xffff883fef201010
>   },
>   mutex = {
>     count = {
>       counter = 1
>     },
>     wait_lock = {
>       {
>         rlock = {
>           raw_lock = {
>             val = {
>               counter = 0
>             }
>           }
>         }
>       }
>     },
>     wait_list = {
>       next = 0xffff883fef200c28,
>       prev = 0xffff883fef200c28
>     },
>     owner = 0x0,
>     osq = {
>       tail = {
>         counter = 0
>       }
>     }
>   },
>   work_color = 1,
>   flush_color = 0,
>   nr_pwqs_to_flush = {
>     counter = 1
>   },
>   first_flusher = 0xffff88027f50b568,
>   flusher_queue = {
>     next = 0xffff883fef200c60,
>     prev = 0xffff883fef200c60
>   },
>   flusher_overflow = {
>     next = 0xffff883fef200c70,
>     prev = 0xffff883fef200c70
>   },
>   maydays = {
>     next = 0xffff883fef200c80,
>     prev = 0xffff883fef200c80
>   },
>   rescuer = 0xffff883feb073680,
>   nr_drainers = 0,
>   saved_max_active = 1,
>   unbound_attrs = 0xffff883fef2b5c20,
>   dfl_pwq = 0xffff883ff1196700,
>   wq_dev = 0x0,
>   name =
> "ipoib_wq\000\000\000\000\000\000\000\000\000\000\000\000\000\000\000",
>   rcu = {
>     next = 0x0,
>     func = 0x0
>   },
>   flags = 131082,
>   cpu_pwqs = 0x0,
>
> Also the state of the mcast_task member:
>
> crash> struct ipoib_dev_priv.mcast_task ffff883fed6ec700
>   mcast_task = {
>     work = {
>       data = {
>         counter = 3072
>       },
>       entry = {
>         next = 0xffff883fed6ec888,
>         prev = 0xffff883fed6ec888
>       },
>       func = 0xffffffffa02e1220 <ipoib_mcast_join_task>
>     },
>     timer = {
>       entry = {
>         next = 0x0,
>         pprev = 0x0
>       },
>       expires = 0,
>       function = 0xffffffff8106dd20 <delayed_work_timer_fn>,
>       data = 18446612406880618624,
>       flags = 2097164,
>       slack = -1
>     },
>     wq = 0x0,
>     cpu = 0
>   }
>
>
> flush_workqueue is essentially waiting on
> wait_for_completion(&this_flusher.done) in flush_workqueue, which
> apparently never returned. I'm assuming this is due to the general
> unavailability of the infiniband network. However, I think it's wrong,
> in case of infiniband being down, the whole server to be rendered
> unresponsive, due to rtnl_lock being held. Do you think it is possible
> to rework this code to prevent it hanging in case the workqueue cannot
> be flushed? Furthermore, do you think it's feasible to put code in
> ipoib_mcast_join_task to detect such situations and not re-arm itself
> and then use cancel_delayed_work_sync ?
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html