On Wed, Dec 06, 2017 at 02:51:41PM +0530, George Cherian wrote: > Hi Michael, > > > On 12/06/2017 12:59 AM, Michael S. Tsirkin wrote: > > Users of ptr_ring expect that it's safe to give the > > data structure a pointer and have it be available > > to consumers, but that actually requires an smb_wmb > > or a stronger barrier. > This is not the exact situation we are seeing. Could you test the patch pls? > Let me try to explain the situation > > Affected on ARM64 platform. > 1) tun_net_xmit calls skb_array_produce, which pushes the skb to the > ptr_ring, this push is protected by a producer_lock. > > 2)Prior to this call the tun_net_xmit calls skb_orphan which calls the > skb->destructor and sets skb->destructor and skb->sk as NULL. > > 2.a) These 2 writes are getting reordered > > 3) At the same time in the receive side (tun_ring_recv), which gets executed > in another core calls skb_array_consume which pulls the skb from ptr ring, > this pull is protected by a consumer lock. > > 4) eventually calling the skb->destructor (sock_wfree) with stale values. > > Also note that this issue is reproducible in a long run and doesn't happen > immediately after the launch of multiple VM's (infact the particular test > cases launches 56 VM's which does iperf back and forth) > > > > > In absence of such barriers and on architectures that reorder writes, > > consumer might read an un=initialized value from an skb pointer stored > > in the skb array. This was observed causing crashes. > > > > To fix, add memory barriers. The barrier we use is a wmb, the > > assumption being that producers do not need to read the value so we do > > not need to order these reads. > It is not the case that producer is reading the value, but the consumer > reading stale value. So we need to have a strict rmb in place . > > > > > Reported-by: George Cherian <george.cherian@xxxxxxxxxx> > > Suggested-by: Jason Wang <jasowang@xxxxxxxxxx> > > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx> > > --- > > > > George, could you pls report whether this patch fixes > > the issue for you? > > > > This seems to be needed in stable as well. > > > > > > > > > > include/linux/ptr_ring.h | 9 +++++++++ > > 1 file changed, 9 insertions(+) > > > > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h > > index 37b4bb2..6866df4 100644 > > --- a/include/linux/ptr_ring.h > > +++ b/include/linux/ptr_ring.h > > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r) > > /* Note: callers invoking this in a loop must use a compiler barrier, > > * for example cpu_relax(). Callers must hold producer_lock. > > + * Callers are responsible for making sure pointer that is being queued > > + * points to a valid data. > > */ > > static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr) > > { > > if (unlikely(!r->size) || r->queue[r->producer]) > > return -ENOSPC; > > + /* Make sure the pointer we are storing points to a valid data. */ > > + /* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */ > > + smp_wmb(); > > + > > r->queue[r->producer++] = ptr; > > if (unlikely(r->producer >= r->size)) > > r->producer = 0; > > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r) > > if (ptr) > > __ptr_ring_discard_one(r); > > + /* Make sure anyone accessing data through the pointer is up to date. */ > > + /* Pairs with smp_wmb in __ptr_ring_produce. */ > > + smp_read_barrier_depends(); > > return ptr; > > } > > _______________________________________________ Virtualization mailing list Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx https://lists.linuxfoundation.org/mailman/listinfo/virtualization