Re: [PATCH] net: mlx5: Use writeX() to ring doorbell and remove reduntant wmb()

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On 3 Jan 2020, at 18:31, Liran Alon <liran.alon@xxxxxxxxxx> wrote:
> 
> 
> 
>> On 3 Jan 2020, at 15:37, Jason Gunthorpe <jgg@xxxxxxxx> wrote:
>> 
>> On Fri, Jan 03, 2020 at 12:21:06AM +0200, Liran Alon wrote:
>> 
>>>> AFAIK WC is largely unspecified by the memory model. Is wmb() even
>>>> formally specified to interact with WC?
>>> 
>>> As I said, I haven’t seen such semantics defined in kernel
>>> documentation such as memory-barriers.txt.  However, in practice, it
>>> does flush WC buffers. At least for x86 and ARM which I’m familiar
>>> enough with.  I think it’s reasonable to assume that wmb() should
>>> flush WC buffers while dma_wmb()/smp_wmb() doesn’t necessarily have
>>> to do this.
>> 
>> It is because WC is rarely used and laregly undefined for the kernel
>> :(
> 
> Yep.
> 
>> 
>>>>>>> Therefore, change mlx5_write64() to use writeX() and remove wmb() from
>>>>>>> it's callers.
>>>>>> 
>>>>>> Yes, wmb(); writel(); is always redundant
>>>>> 
>>>>> Well, unfortunately not…
>>>>> See: https://urldefense.proofpoint.com/v2/url?u=https-3A__marc.info_-3Fl-3Dlinux-2Dnetdev-26m-3D157798859215697-26w-3D2&d=DwIDaQ&c=RoP1YumCXCgaWHvlZYR8PZh8Bv7qIrMUB65eapI_JnE&r=Jk6Q8nNzkQ6LJ6g42qARkg6ryIDGQr-yKXPNGZbpTx0&m=Ox1lCS1KAGBvJrf24kiFQrranIaNi_zeo05sqCUEf7Y&s=Mz6MJzUQ862DGjgGnj3neX4ZpjI88nOI9KpZhNF9TqQ&e=
>>>>> (See my suggestion to add flush_wc_writeX())
>>>> 
>>>> Well, the last time wmb & writel came up Linus was pretty clear that
>>>> writel is supposed to remain in program order and have the barriers
>>>> needed to do that.
>>> 
>>> Right. But that doesn’t take into account that WC writes are
>>> considered completed when they are still posted in CPU WC buffers.
>> 
>>> The semantics as I understand of writeX() is that it guarantees all
>>> prior writes have been completed.  It means that all prior stores
>>> have executed and that store-buffer is flushed. But it doesn’t mean
>>> that WC buffers is flushed as-well.
>> 
>> The semantic for writel is that prior program order stores will be
>> observable by DMA from the device receiving the writel. This is
>> required for UC and NC stores today. WC is undefined, I think.
>> 
>> This is why ARM has the additional barrier in writel.
> 
> Yep.
> 
>> 
>> It would logically make sense if WC followed the same rule, however,
>> adding a barrier to writel to make WC ordered would not be popular, so
>> I think we are left with using special accessors for WC and placing
>> the barrier there..
> 
> Right.
> 
>> 
>>>> IMHO you should start there before going around and adding/removing wmbs
>>>> related to WC. Update membory-barriers.txt and related with the model
>>>> ordering for WC access and get agreement.
>>> 
>>> I disagree here. It’s more important to fix a real bug (e.g. Not
>>> flushing WC buffers on x86 AMD).  Then, we can later formalise this
>>> and refactor code as necessary. Which will also optimise it as-well.
>>> Bug fix can be merged before we finish all these discussions and get
>>> agreement.
>> 
>> Is it a real bug that people actually hit? It wasn't clear from the
>> commit message. If so, sure, it should be fixed and the commit message
>> clarified. (but I'd put the wmb near the WC writes..)
> 
> I found this bug during code review. I’m not aware if AWS saw this bug happening in production.
> But according to AMD SDM and Optimization Guide SDM, this is a bug.
> 
> I think it doesn’t happen in practice because the write of the Tx descriptor + 128 first bytes of packet
> Effectively fills the relevant WC buffers and when a WC buffer is fully written to, the CPU *should*
> (Not *must*) flush the WC buffer to memory.

Actually after re-reading AMD Optimization Guide SDM, I see it is guaranteed that:
“Write-combining is closed if all 64 bytes of the write buffer are valid”.
And this is indeed always the case for AWS ENA LLQ. Because as can be seen at
ena_com_config_llq_info(), desc_list_entry_size is either 128, 192 or 256. i.e. Always
a multiple of 64 bytes. So this explains why this wasn’t an issue in production.

-Liran






[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]

  Powered by Linux