On Mon, Mar 12, 2018 at 6:15 PM, Rohit Zambre <rzambre@xxxxxxx> wrote: > On Mon, Mar 12, 2018 at 2:37 PM, Jason Gunthorpe <jgg@xxxxxxxx> wrote: >> On Sun, Mar 11, 2018 at 08:07:43PM -0500, Rohit Zambre wrote: >> >>> the different bfregs. However, the Mellanox PRM states that doorbells >>> to the same UAR page must be serialized. >> >> This seems like a nonsense statement to me. doorbell rings are >> indivisible 64 bit writes, there is no concept of serialization of >> those writes to a PCI-E device. > > In the "Sharing UARs" section, the PRM states "No other DoorBell can > be rung (or even start ringing) in the midst of an on-going write of a > DoorBell over a given UAR page... the access to a UAR must be > synchronized unless an atomic write of 64 bits in a single bus > operation is guaranteed." Since this statement is talking about the 64-bit DoorBell writes, I took a look at where the DoorBell is being written. mlx5_bf_copy calls the COPY_64B_NT macro which, for x86, writes 64 bytes into the blue flame buffer using double quadword (128-bits) move-instructions. The first 64-bits of the blue flame buffer is the doorbell register. So the first movntdq should write the doorbell register. However, according to the Intel SDM, the atomicity of writing a double quadword is not guaranteed. Is it safe to assume that the 128-bit move is implemented using two 64-bit moves rather than four 32-bit moves? #if defined(__x86_64__) #define COPY_64B_NT(dst, src) \ __asm__ __volatile__ ( \ " movdqa (%1),%%xmm0\n" \ " movdqa 16(%1),%%xmm1\n" \ " movdqa 32(%1),%%xmm2\n" \ " movdqa 48(%1),%%xmm3\n" \ " movntdq %%xmm0, (%0)\n" \ " movntdq %%xmm1, 16(%0)\n" \ " movntdq %%xmm2, 32(%0)\n" \ " movntdq %%xmm3, 48(%0)\n" \ : : "r" (dst), "r" (src) : "memory"); \ dst += 8; \ src += 8 Thanks, Rohit -- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html