Re: dm-writecache issue

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 9/18/18 9:42 AM, Mikulas Patocka wrote:
> 
> 
> On Tue, 18 Sep 2018, Eric Sandeen wrote:
> 
>> On 9/18/18 9:29 AM, Mikulas Patocka wrote:
>>
>>> On Tue, 18 Sep 2018, Eric Sandeen wrote:
>>
>> ...
>>
>>>> See also 
>>>> https://www.intel.com/content/www/us/en/support/articles/000006392/memory-and-storage.html
>>>>
>>>> -Eric
>>>
>>> And does it really support native 512-byte writes? Or does it emulate 
>>> 512-byte writes by doing read-modify-write? That needs to be benchmarked, 
>>> the paper doesn't say that.
>>
>> Interesting from a manual tuning perspective, but not from a default
>> behavior perspective.
>>
>> I'm just pointing out that Intel does seem to give the user a choice about
>> the /advertised/ geometry for some of their SSDs.
>>
>>> Memory is expensive and reducing SSD sector size increases memory 
>>> requirement on the SSD. I doubt that any SSD vendor would want to use 
>>> 8-times more memory just to support 512-byte sectors natively.
>>
>> Marketing decisions aside, we just can't safely ignore what the device
>> tells us about these IO sizes.
> 
> No one is forcing you to use 512-byte writes. You can use 4k writes on a 
> device that advertises 512-byte sectors.

Of course.  But not if you require those 4k writes to be /atomic/.
 
> ext4 uses 4k block size by default (and lets the user lower it if the user 
> is tight on disk space and doesn't care about performance).

I think you may be conflating sector size with filesystem block size.

ext4 makes no distinction between the two.

XFS has both sector size (metadata atomic IO unit) and filesystem block size
(file data allocation unit) as configurable mkfs-time options. The sector size
can be smaller than, and up to, the filesystem block size.

mkfs.xfs defaults to 4k filesystem blocks and device-physical-sector-sized
sectors, i.e. the largest atomic IO the device advertises, because XFS
metadata journaling relies on this IO atomicity.  We allocate file data in
4k chunks, and do atomic metadata IO in device-sector-sized chunks.

ext4 doesn't - it's true - but I cannot help but believe that ext4 occasionally
gets harmed by this choice, because it's absolutely possible that a 4k
metadata write gets only partly-persisted if power fails on a 512/512 disk,
for example.  In practice it seems to generally work out ok, but it is going
beyond what the device says it can guarantee.

-Eric



[Index of Archives]     [XFS Filesystem Development (older mail)]     [Linux Filesystem Development]     [Linux Audio Users]     [Yosemite Trails]     [Linux Kernel]     [Linux RAID]     [Linux SCSI]


  Powered by Linux