Hi,
On 2024/06/07 00:17, Zdenek Kabelac wrote:
Dne 07. 06. 24 v 0:14 Zdenek Kabelac napsal(a):
Dne 05. 06. 24 v 10:59 Jaco Kroon napsal(a):
Hi,
On 2024/06/04 18:07, Zdenek Kabelac wrote:
Dne 04. 06. 24 v 13:52 Jaco Kroon napsal(a):
Last but not least - disk scheduling policies also do have impact
- to i.e. ensure better fairness - at the prices of lower
throughput...
We normally use mq-deadline, in this setup I notice this has been
updated to "none", the plan was to revert, this was done in
collaboration with a discussion with Bart van Assche. Happy to
revert this to be honest.
https://lore.kernel.org/all/07d8b189-9379-560b-3291-3feb66d98e5c@xxxxxxx/
relates.
Hi
So I guess we can tell the store like this -
When you've created your 'snapshot' of a thin-volume - this enforces
full flush (& fsfreeze) of a thin volume - so any dirty pages need to
written in thin pool before snapshot could be taken (and thin pool
should not run out of space) - this CAN potentially hold your system
running for a long time (depending on performance of your storage)
and may cause various lock-ups states of your system if you are using
this 'snapshoted' volume for anything else - as the volume is
suspended - so it blocks further operations on this device -
eventually causing full system circular deadlock (catch 22) - this
is hard to analyze without whole picture of the system.
We may eventually think whether we can somehow minimize the amount of
holding
vglock and suspending with flush & fsfreeze - but it's about some
future possible enhancement and flush disk upfront to minimize dirty
size.
I've forget to mention that a 'simplest' way is just to run 'sync'
before running 'lvcreate -s...' command...
Thanks. I think all in all everything mentioned here makes a lot of
sense, and (in my opinion at least) explains the symptoms we've been seeing.
Overall the system does "feel" more responsive with the lower dirty
buffers, and most likely it helps with data persistence (as has been
mentioned) in case of system crashes and/or loss of power.
The tasks during peak usage also does seem to run faster on average, I
suspect this is because of the use-case for this host:
1. Data is seldomly overwritten (this was touched on). Pretty much
everything is WORM-type access (Write-Once, Read-Many).
2. Caches are mostly needed to avoid read-bandwidth from consuming
capacity for writing.
3. It's thus beneficial to get writes out of the way as soon as
possible, rather than at a later stage having to block getting many
writes done for a flush() or sync() or lvcreate (snapshot).
Is 500MB needlessly low? Probably. But given the above I think this is
acceptable. Rather keep the disk writing *now* in order to free up
*future* capacity.
I'm guessing your "simple way" is workable for the generic case as well,
towards that end, is a relatively simple change to the lvm2 tools not
perhaps to add an syncfs() call to lvcreate *just prior* to freezing?
The hard part is probably to figure out if the LV is mounted somewhere,
and if it is, to open() that path in order to have a file-descriptor to
pass to syncfs()? Obviously if the LV isn't mounted none of this is a
concern and we can just proceed.
What would be more interesting is if cluster-lvm is in play and the
origin LV is active/open on an alternative node? But that's well beyond
the scope of our requirements (for now).
Kind regards,
Jaco