Re: Write atomicity guarantees

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Thu, Apr 24, 2014 at 11:03 AM, Chris Mason <clm@xxxxxx> wrote:
> On 04/24/2014 01:39 PM, Matthew Wilcox wrote:
>>
>>
>> NVMe allows the drive to tell the host what atomicity guarantees it
>> provides for a write command.  At the moment, I don't think Linux has
>> a way for the driver to pass that information up to the filesystem.
>>
>> The value that is most interesting to report is Atomic Write Unit Power
>> Fail ("if you send a write no larger than this, the drive guarantees to
>> write all of it or none of it"), minimum value 1 sector. [1]
>>
>> There's a proposal before the NVMe workgroup to add a boundary size/offset
>> to modify AWUPF ("except if you cross this boundary, then AWUPF is not
>> guaranteed").  Think RAID stripe crossing.
>>
>> So, three questions.  Is there somewhere already to pass boundary
>> information up to the filesystem?  Can filesystems make use of a larger
>> atomic write unit than a single sector?  And, if the device is internally
>> a RAID device, is knowing the boundary size/offset useful?
>>
>>
>> [1] There is also Atomic Write Unit Normal ("if you send two writes,
>> neither of which is larger than this, subsequent reads will get either
>> one or the other, not a mixture of both"), which I don't think we care
>> about because the page cache prevents us from sending two writes which
>> overlap with each other.
>
>
> I think we really need the atomics to be vectored.  Send N writes which as a
> unit are not larger than X, but which may span anywhere on device.  An array
> with writeback cache, or a log structured squirrel in the FTL should be able
> to provide this pretty easily?
>
> The immediate use case is mysql (16K writes) on a fragmented filesystem.
> The FS needs to be able to collect a single atomic write made up of N 4K
> sectors.

How big does N need to be before it starts to be generally useful?
Here it seems we're talking on the order to tens of writes, but for
the upper bound Dave said that N could be in the hundreds of thousands
[1].

--
Dan

[1]: http://marc.info/?l=linux-fsdevel&m=139262740324307&w=2
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux