Re: [Question] why not flush device cache at _vg_commit_raw

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 




> On Jan 22, 2024, at 23:26, Ilia Zykov <mail@xxxxxxxxxxx> wrote:
> 
> On 22.01.2024 17:52, Zdenek Kabelac wrote:
>> Dne 22. 01. 24 v 14:46 Anthony Iliopoulos napsal(a):
>>> On Mon, Jan 22, 2024 at 01:48:41PM +0100, Zdenek Kabelac wrote:
>>>> Dne 22. 01. 24 v 12:22 Su Yue napsal(a):
>>>>> Hi lvm folks,
>>>>>     Recently We received a report about the device cache issue after vgchange —deltag.
>>>>> What confuses me is that lvm never calls fsync on block devices even at the end of commit phase.
>>>>> 
>>>>> IIRC, it’s common operations for userspace tools to call fsync/O_SYNC/O_DSYNC while writing
>>>>> critical data. Yes, lvm2 opens devices with O_DIRECT if they support , but O_DIRECT doesn't
>>>>> provide data was persistent to storage when write returns. The data can still be in the device cache,
>>>>> If power failure happens in the timing, such critical metadata/data like vg metadata could be lost.
>>>>> 
>>>>> Is there any particular reason not to flush data cache at VG commit time?
>>>>> 
>>>> 
>>>> Hi
>>>> 
>>>> It seems the call to 'dev_flush()' function got somehow lost over the time
>>>> of conversion to async aio usage - I'll investigate.
>>>> 
>>>> On the other hand the chance here of losing any data this way would be
>>>> really really very specific to some oddly behaving device.
>>> 
>>> There's no guarantee that data will be persisted to storage without
>>> explicitly flushing the device data cache. Those are usually volatile
>>> write-back caches, so the data aren't really protected against power
>>> loss without fsyncing the blockdev.
>> At technical level modern storage devices 'should' have enough energy held internally to be able to flush out all the caches in emergency cases to the persistent storage. So unless we deal with some 'virtual' storage that may fake various responses to IO handling - this should not be causing major troubles.
>> However it's clearly a problem which happened while the code has been shifted towards the use of libaio.
>> Zdenek
> 
> More over. There is a very old post about fsync() lying.
> https://brad.livejournal.com/2116715.html
> I don’t know, maybe this is also a post-lie) Or now the devices have become more truthful.
> But many devices report that "Write cache" is enabled:
> 
> hdparm -I /dev/sda | grep 'Write cache'
>             * Write cache
> 
> And in many cases fsync() flushes data to write cache only.
> But this can be persistent (ssd, flash) cache. Or as Zdenek has wrote,
> "devices 'should' have enough energy held internally to be able to flush out all the caches in  in emergency cases".
> 
> However, in some cases, they may lose some data due to power failure and large amount of dirty data in the cache, especially ordinary, non-enterprise HDD. IMHO.
> 
Yes… The mechanism of write cache varies in different manufacturer and products. 
Some implements can even lie about the cache flush/FUA in 2024.
For serious enterprise cases, strict tests should be done for devices before uses in production lines.

The point is that filesystems and lvm should trust the underlying devices write barriers/flushing and make
best efforts to keep data integrity.

—
Su
> ----






[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux