Re: [PATCH] ptr_ring: add barriers

"Michael S. Tsirkin" <mst@xxxxxxxxxx> · Wed, 6 Dec 2017 14:46:54 +0200

On Wed, Dec 06, 2017 at 02:51:41PM +0530, George Cherian wrote:
> Hi Michael,
> 
> 
> On 12/06/2017 12:59 AM, Michael S. Tsirkin wrote:
> > Users of ptr_ring expect that it's safe to give the
> > data structure a pointer and have it be available
> > to consumers, but that actually requires an smb_wmb
> > or a stronger barrier.
> This is not the exact situation we are seeing.

Could you test the patch pls?

> Let me try to explain the situation
> 
> Affected on ARM64 platform.
> 1) tun_net_xmit calls skb_array_produce, which pushes the skb to the
> ptr_ring, this push is protected by a producer_lock.
> 
> 2)Prior to this call the tun_net_xmit calls skb_orphan which calls the
> skb->destructor and sets skb->destructor and skb->sk as NULL.
> 
> 2.a) These 2 writes are getting reordered
> 
> 3) At the same time in the receive side (tun_ring_recv), which gets executed
> in another core calls skb_array_consume which pulls the skb from  ptr ring,
> this pull is protected by a consumer lock.
> 
> 4) eventually calling the skb->destructor (sock_wfree) with stale values.
> 
> Also note that this issue is reproducible in a long run and doesn't happen
> immediately after the launch of multiple VM's (infact the particular test
> cases launches 56 VM's which does iperf back and forth)
> 
> > 
> > In absence of such barriers and on architectures that reorder writes,
> > consumer might read an un=initialized value from an skb pointer stored
> > in the skb array.  This was observed causing crashes.
> > 
> > To fix, add memory barriers.  The barrier we use is a wmb, the
> > assumption being that producers do not need to read the value so we do
> > not need to order these reads.
> It is not the case that producer is reading the value, but the consumer
> reading stale value. So we need to have a strict rmb in place .
> 
> > 
> > Reported-by: George Cherian <george.cherian@xxxxxxxxxx>
> > Suggested-by: Jason Wang <jasowang@xxxxxxxxxx>
> > Signed-off-by: Michael S. Tsirkin <mst@xxxxxxxxxx>
> > ---
> > 
> > George, could you pls report whether this patch fixes
> > the issue for you?
> > 
> > This seems to be needed in stable as well.
> > 
> > 
> > 
> > 
> >   include/linux/ptr_ring.h | 9 +++++++++
> >   1 file changed, 9 insertions(+)
> > 
> > diff --git a/include/linux/ptr_ring.h b/include/linux/ptr_ring.h
> > index 37b4bb2..6866df4 100644
> > --- a/include/linux/ptr_ring.h
> > +++ b/include/linux/ptr_ring.h
> > @@ -101,12 +101,18 @@ static inline bool ptr_ring_full_bh(struct ptr_ring *r)
> >   /* Note: callers invoking this in a loop must use a compiler barrier,
> >    * for example cpu_relax(). Callers must hold producer_lock.
> > + * Callers are responsible for making sure pointer that is being queued
> > + * points to a valid data.
> >    */
> >   static inline int __ptr_ring_produce(struct ptr_ring *r, void *ptr)
> >   {
> >   	if (unlikely(!r->size) || r->queue[r->producer])
> >   		return -ENOSPC;
> > +	/* Make sure the pointer we are storing points to a valid data. */
> > +	/* Pairs with smp_read_barrier_depends in __ptr_ring_consume. */
> > +	smp_wmb();
> > +
> >   	r->queue[r->producer++] = ptr;
> >   	if (unlikely(r->producer >= r->size))
> >   		r->producer = 0;
> > @@ -275,6 +281,9 @@ static inline void *__ptr_ring_consume(struct ptr_ring *r)
> >   	if (ptr)
> >   		__ptr_ring_discard_one(r);
> > +	/* Make sure anyone accessing data through the pointer is up to date. */
> > +	/* Pairs with smp_wmb in __ptr_ring_produce. */
> > +	smp_read_barrier_depends();
> >   	return ptr;
> >   }
> > 
_______________________________________________
Virtualization mailing list
Virtualization@xxxxxxxxxxxxxxxxxxxxxxxxxx
https://lists.linuxfoundation.org/mailman/listinfo/virtualization