[sending again since the mailing list didn't get my latest reply] On Fri, Mar 25, 2016 at 8:52 PM, Or Gerlitz <gerlitz.or@xxxxxxxxx> wrote: > On Fri, Mar 25, 2016 at 2:35 PM, Nikolay Borisov <kernel@xxxxxxxx> wrote: > [..] >> And having kernel.hung_task_panic sysctl set to 1 caused a lot of >> machines to reboot. In any case I don't think it's normal to have hung >> tasks when your network is out. This happens due to the >> wait_for_completion(&cm_id_priv->comp); never returning in cm_destroy_id >> function. I saw there is one place where the cm_id refcount is >> decremented via normal atomic_dec and not cm_deref_id under >> cm_req_handle's rejected label. I dunno if this is correct or now, but >> there definitely seems to be some refcounting problem. > > You didn't specified your kernel version, please do so. Also, do you > have some known point in time (== kernel version) where it worked vs > the current situation? So this is happening on the latest stable 4.4 kernel - 4.4.6. Unfortunately I can't say whether this is a regression or a new bug. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html