Re: [Question] why not flush device cache at _vg_commit_raw

Zdenek Kabelac <zdenek.kabelac@xxxxxxxxx> · Tue, 23 Jan 2024 18:50:01 +0100

Dne 23. 01. 24 v 17:42 Demi Marie Obenour napsal(a):
On Mon, Jan 22, 2024 at 03:52:57PM +0100, Zdenek Kabelac wrote:
Dne 22. 01. 24 v 14:46 Anthony Iliopoulos napsal(a):
On Mon, Jan 22, 2024 at 01:48:41PM +0100, Zdenek Kabelac wrote:
Dne 22. 01. 24 v 12:22 Su Yue napsal(a):
Hi lvm folks,
     Recently We received a report about the device cache issue after vgchange —deltag.
What confuses me is that lvm never calls fsync on block devices even at the end of commit phase.

IIRC, it’s common operations for userspace tools to call fsync/O_SYNC/O_DSYNC while writing
critical data. Yes, lvm2 opens devices with O_DIRECT if they support , but O_DIRECT doesn't
provide data was persistent to storage when write returns. The data can still be in the device cache,
If power failure happens in the timing, such critical metadata/data like vg metadata could be lost.

Is there any particular reason not to flush data cache at VG commit time?

Hi

It seems the call to 'dev_flush()' function got somehow lost over the time
of conversion to async aio usage - I'll investigate.

On the other hand the chance here of losing any data this way would be
really really very specific to some oddly behaving device.
There's no guarantee that data will be persisted to storage without
explicitly flushing the device data cache. Those are usually volatile
write-back caches, so the data aren't really protected against power
loss without fsyncing the blockdev.
At technical level modern storage devices 'should' have enough energy held
internally to be able to flush out all the caches in emergency cases to the
persistent storage. So unless we deal with some 'virtual' storage that may
fake various responses to IO handling - this should not be causing major
troubles.
This is only true for enterprise storage with power loss protection.
The vast majority of Qubes OS users use LVM with consumer storage, which
does not have power loss protection.  If this is unsafe, then Qubes OS
should switch to a different storage pool that flushes drive caches as
needed.
From lvm2 perspective - there are first written metadata - then there is 
usually a full flush of all I/O and suspend to the actual device - if there is 
any device already active on such disk -  so even if there would be no direct 
flush initiated by lvm2 itself - there is going to such on whenever we update 
existing LVs.
There is usually a stream of cache flushing operation whenever i.e. thin-pool 
is synchronizing metadata or any app running of device is synchronizing its 
data as well.
So while lvm2 is using O_DIRECT with write - there is likely a tiny window of 
opportunity where the user could 'crash' the device with lose of it's caches. 
If this happens - lvm2 still has 'history' & archive so it should be at worst 
case scenario see the older version of metadata for possible recovery.
All that said - for so many years - we have not seen a single reported issue 
caused by such mysterious crash event yet - and the potential 'risk of 
failure' could likely happen only in the case of user creating some new empty 
LV - so there shouldn't be a risk of losing any real data (unless I miss 
something).
So while we figure out how to add proper fsync call for device writes - as it 
seems to be still demanded with  direct i/o usage, it's IMHO not a reason to 
stop using of lvm2 ;)
Regards

Zdenek