On 11/8/19 3:38 PM, Himanshu Madhani wrote:
Hi Bart,
On Nov 7, 2019, at 12:30 PM, Bart Van Assche <bvanassche@xxxxxxx> wrote:
External Email
----------------------------------------------------------------------
On 11/7/19 9:58 AM, Bart Van Assche wrote:
Does your answer mean that this hang has not yet been root-caused fully
and hence that it is possible this patch is only a workaround but not a
fix of the root cause?
Answering my own question: I think that a qpair refcount leak is a severe problem and not something that should be ignored. How about changing the while loop into something like the following:
if (atomic_read(&qpair->ref_count))
msleep(500);
WARN_ON_ONCE(atomic_read(&qpair->ref_count));
Thanks,
Bart.
Since we had seen this hang in a specific cluster environment and refcount leak was observed, I would like to add this patch as is and will consider your suggestion to verify if adding WARN_ON_ONCE will make any difference. If we discover that adding WARN_ON_ONCE indeed helps, then I will add a patch with fixes tag during rc window.
Let me know if you disagree.
Hi Himanshu,
Please do not suppress reports of kernel bugs but instead make sure that
some report is provided that indicates that something went wrong and
needs further attention.
Thanks,
Bart.