On 3/1/2017 4:51 PM, Christoph Hellwig wrote: > On Wed, Mar 01, 2017 at 04:30:26PM +0200, Noa Osherovich wrote: >> Analysis: >> Since ib_comp_wq isn't single threaded, two works can run in parallel for the same CQ, >> executing __ib_process_cq. > They shouldn't. Each CQ has a single work_struct, and any given work_struct > should only be executing at once: > > "Note that the flag ``WQ_NON_REENTRANT`` no longer exists as all > workqueues are now non-reentrant - any work item is guaranteed to be > executed by at most one worker system-wide at any given time." > >> Since this function isn't thread safe and the wc array is shared, it causes a data corruption >> which eventually crashes in the MAD layer due to a double list_del of the same element. > This should not be the case. What kernel version are you testing and does > it contain any patches touching core kernel code? Thanks Christoph for the quick response. Currently we see this only in old kernels. I'll investigate this more and update. -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html