On Mon, Jan 22, 2018 at 10:40:53AM +0800, jianchao.wang wrote: > Hi Eric > > On 01/22/2018 12:43 AM, Eric Dumazet wrote: > > On Sun, 2018-01-21 at 18:24 +0200, Tariq Toukan wrote: > >> > >> On 21/01/2018 11:31 AM, Tariq Toukan wrote: > >>> > >>> > >>> On 19/01/2018 5:49 PM, Eric Dumazet wrote: > >>>> On Fri, 2018-01-19 at 23:16 +0800, jianchao.wang wrote: > >>>>> Hi Tariq > >>>>> > >>>>> Very sad that the crash was reproduced again after applied the patch. > >> > >> Memory barriers vary for different Archs, can you please share more > >> details regarding arch and repro steps? > > > > Yeah, mlx4 NICs in Google fleet receive trillions of packets per > > second, and we never noticed an issue. > > > > Although we are using a slightly different driver, using order-0 pages > > and fast pages recycling. > > > > > The driver we use will will set the page reference count to (size of pages)/stride, the > pages will be freed by networking stack when the reference become zero, and the order-3 > pages maybe allocated soon, this give NIC device a chance to corrupt the pages which have > been allocated by others, such as slab. But it looks like the wmb() is placed when stuffing new rx descriptors into the device - how can it prevent corruption of pages where ownership was transfered from device to the host? That sounds more like a rmb() is missing someplace to me... (Granted the missing wmb() is a bug, but it may not be fully solving this issue??) Jason -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html