Re: RDMA power failure write atomicity

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Anuj Kalia wrote on 03/11/2016 05:14 PM:
> There are several factors that make this problem hard. For many modern
> servers, DMA data is written to last level cache via DDIO, i.e., it
> will not go to the NVDIMM unless the remote CPU flushes the cache /
> cache lines. On servers where data is written to DRAM (or to an NVDIMM
> attached to memory bus), the data can (probably) still be buffered by
> the CPU's memory controller.

It sounds to me that then it should have the regular CPU write atomicity properties,
i.e. on 64-bit Intel: 8 bytes with 8 bytes alignment.

> I am not sure how much control RDMA NICs have over these factors.
> AFAIK, there is no PCIe command to flush either cache lines or memory
> controller buffers, so flushing to DRAM this is beyond what RDMA NICs
> can currently accomplish.

Yes, but flushing data is beyond my question, which is only about what type of pattern
of eventual data you can see on power failure, with or without flushing.

If there are no any minimal power failure atomicity guarantees, it would mean
effectively disable any write-in place into NVRAM/PMEM, because you can end up with
mixed old and new, hence, corrupted data. You would not be able to ever atomically
switch a pointer, so ever classical "write in new location, flush, than switch pointer
to the new data" approach would not work anymore. As result, value of what is proposed
in draft-Talpey-rdma-commit-00.txt (which is very good proposal) would be significantly
lower, because, unless I'm missing something, the only available use case for RDMA
writes bypassing remote CPU that would withstand is logs replication with each entry
protected by a checksum, so on recovery after power failure you can figure out the last
corrupted record. However, records compaction would still be done via remote CPU (no
bypassing), because only CPU can power failure atomically switch pointers in NVRAM/PMEM.

So, it seems to me that something minimal, like 8 bytes, must be defined. I wonder,
maybe it has already been defined. Looks like, not.

Thanks,
Vlad

> --Anuj (rdma_guy)
> 
> On Fri, Mar 11, 2016 at 7:26 PM, Vladislav Bolkhovitin <vst@xxxxxxxx> wrote:
>> I'm aware of this proposal. Unfortunately, it is quite orthogonal to my question,
>> because it is about how to ensure persistence of RDMA writes. Atomicity it is
>> mentioning as well as general RDMA atomicity is atomicity with regard of parallel
>> commands acting on the same locations. However, I'm asking about power failure
>> atomicity, which is something different.
>>
>> For instance, you are doing RDMA WRITE of 10 bytes of data. If a power failure happen
>> while this operation is in progress, what data will end up on the target location? All
>> 10 bytes new? All 10 bytes old? Or mix of 5 bytes new and five bytes old? Power failure
>> atomicity I mean is guarantee that the data either old, or new, never mix of old and
>> new data.
>>
>> Thanks,
>> Vlad
>>
>> Asgeir Eiriksson wrote on 03/10/2016 05:33 PM:
>>> Vladislav,
>>>
>>> This is an area of active R&D
>>>
>>> You might be interested in the following (at ietf.org):
>>>
>>> Title           : RDMA Durable Write Commit
>>>         Authors         : Tom Talpey
>>>                                Jim Pinkerton
>>>                           <>
>>>       Filename      : draft-talpey-rdma-commit-00.txt
>>>       Pages          : 24
>>>       Date            : 2016-02-19
>>>
>>> Regards,
>>>
>>> ‘Asgeir
>>>
>>>
>>>> On Mar 10, 2016, at 3:45 PM, Vladislav Bolkhovitin <vst@xxxxxxxx> wrote:
>>>>
>>>> Hello,
>>>>
>>>> I'm currently considering to use NVDIMM behind RDMA and wonder what is RDMA power
>>>> failure write atomicity? I mean, what is minimal size and alignment guaranteed to be
>>>> written atomically in face of power failure (or some other similar failure), i.e.
>>>> either written in full, or not written at all?
>>>>
>>>> For memory writes on Intel it is 8 bytes with 8 bytes alignment. Is there anything like
>>>> this for RDMA? Or different vendors/implementation have so different expectations and
>>>> promises, so you can not assume anything >1 byte?
>>>>
>>>> I can't find such info anywhere.
>>>>
>>>> Thanks,
>>>> Vlad
>>
>> --
>> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 

--
To unsubscribe from this list: send the line "unsubscribe linux-rdma" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Photo]     [Yosemite News]     [Yosemite Photos]     [Linux Kernel]     [Linux SCSI]     [XFree86]
  Powered by Linux