Re: [PATCH] dmaengine: idxd: Change wmb() to smp_wmb() when copying completion record to user space

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Tue, Jan 30, 2024 at 05:58:24PM +0000, Mark Rutland wrote:
> This patch might be ok (it looks reasonable as an optimization), but I think
> the description of wmb() and smp_wmb() is incorrect. I also think that you're

Agreed. A wmb() -> smp_wmb() change can only be an optimization rather
than a fix.

> missing an rmb()/smp_rmb()eor equivalent on the reader side.
> 
> On Mon, Jan 29, 2024 at 06:58:06PM -0800, Fenghua Yu wrote:
> > wmb() is used to ensure status in the completion record is written
> > after the rest of the completion record, making it visible to the user.
> > However, on SMP systems, this may not guarantee visibility across
> > different CPUs.
> > 
> > Considering this scenario that event log handler is running on CPU1 while
> > user app is polling completion record (cr) status on CPU2:
> > 
> > 	CPU1				CPU2
> > event log handler			user app
> > 
> > 					1. cr = 0 (status = 0)
> > 2. copy X to user cr except "status"
> > 3. wmb()
> > 4. copy Y to user cr "status"
> > 					5. poll status value Y
> > 				 	6. read rest cr which is still 0.
> > 					   cr handling fails
> > 					7. cr value X visible now
> > 
> > Although wmb() ensure value Y is written and visible after X is written
> > on CPU1, the order is not guaranteed on CPU2. So user app may see status
> > value Y while cr value X is still not visible yet on CPU2. This will
> > cause reading 0 from the rest of cr and cr handling fails.
> 
> The wmb() on CPU1 ensures the order of the reads, but you need an rmb() on CPU2
> between reading the 'status' and 'rest' parts; otherwise CPU2 (or the
> compiler!) is permitted to hoist the read of 'rest' early, before reading from
> 'status', and hence you can end up with a sequence that is effectively:
> 
> 	CPU1				CPU2
>   event log handler			user app
> 					
>   					1. cr = 0 (status = 0)
>   				 	6a. read rest cr which is still 0.
>   2. copy X to user cr except "status"
>   3. wmb()
>   4. copy Y to user cr "status"
>   					5. poll status value Y
>   					6b. cr handling fails
>   					7. cr value X visible now
> 
> Since this is all to regular cacheable memory, it's *sufficient* to use
> smp_wmb() and smp_rmb(), but that's an optimization rather than an ordering
> fix.
> 
> Note that on x86_64, TSO means that the stores are in-order (and so smp_wmb()
> is just a compiler barrier), and IIUC loads are not reordered w.r.t. other
> loads (and so smp_rmb() is also just a compiler barrier).
> 
> > Changing wmb() to smp_wmb() ensures Y is written after X on both CPU1
> > and CPU2. This guarantees that user app can consume cr in right order.

A barrier can only provide ordering for memory accesses on the same CPU,
so this doesn't make any sense.

> 
> This implies that smp_wmb() is *stronger* than wmb(), whereas smp_wmb() is
> actually *weaker* (e.g. on x86_64 wmb() is an sfence, whereas smp_wmb() is a
> barrier()).
> 
> Thanks,
> Mark.
> 
> > 
> > Fixes: b022f59725f0 ("dmaengine: idxd: add idxd_copy_cr() to copy user completion record during page fault handling")
> > Suggested-by: Nikhil Rao <nikhil.rao@xxxxxxxxx>
> > Tested-by: Tony Zhu <tony.zhu@xxxxxxxxx>

Since it has a "Fixes" tag and a "Tested-by" tag, I'd assume there has
been a test w/ and w/o this patch showing it can resolve a real issue
*constantly*? If so, I think x86 might be broken somewhere.

[Cc x86 maintainers]

Regards,
Boqun

> > Signed-off-by: Fenghua Yu <fenghua.yu@xxxxxxxxx>
> > ---
> >  drivers/dma/idxd/cdev.c | 5 +++--
> >  1 file changed, 3 insertions(+), 2 deletions(-)
> > 
> > diff --git a/drivers/dma/idxd/cdev.c b/drivers/dma/idxd/cdev.c
> > index 77f8885cf407..9b7388a23cbe 100644
> > --- a/drivers/dma/idxd/cdev.c
> > +++ b/drivers/dma/idxd/cdev.c
> > @@ -681,9 +681,10 @@ int idxd_copy_cr(struct idxd_wq *wq, ioasid_t pasid, unsigned long addr,
> >  		 * Ensure that the completion record's status field is written
> >  		 * after the rest of the completion record has been written.
> >  		 * This ensures that the user receives the correct completion
> > -		 * record information once polling for a non-zero status.
> > +		 * record information on any CPU once polling for a non-zero
> > +		 * status.
> >  		 */
> > -		wmb();
> > +		smp_wmb();
> >  		status = *(u8 *)cr;
> >  		if (put_user(status, (u8 __user *)addr))
> >  			left += status_size;
> > -- 
> > 2.37.1
> > 
> > 




[Index of Archives]     [Linux Kernel]     [Linux ARM (vger)]     [Linux ARM MSM]     [Linux Omap]     [Linux Arm]     [Linux Tegra]     [Fedora ARM]     [Linux for Samsung SOC]     [eCos]     [Linux PCI]     [Linux Fastboot]     [Gcc Help]     [Git]     [DCCP]     [IETF Announce]     [Security]     [Linux MIPS]     [Yosemite Campsites]

  Powered by Linux