Re: lvm2 deadlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

On 2024/06/07 00:17, Zdenek Kabelac wrote:
Dne 07. 06. 24 v 0:14 Zdenek Kabelac napsal(a):
Dne 05. 06. 24 v 10:59 Jaco Kroon napsal(a):
Hi,

On 2024/06/04 18:07, Zdenek Kabelac wrote:
Dne 04. 06. 24 v 13:52 Jaco Kroon napsal(a):
Last but not least -  disk scheduling policies also do have impact - to i.e. ensure better fairness - at the prices of lower throughput...
We normally use mq-deadline, in this setup I notice this has been updated to "none", the plan was to revert, this was done in collaboration with a discussion with Bart van Assche. Happy to revert this to be honest. https://lore.kernel.org/all/07d8b189-9379-560b-3291-3feb66d98e5c@xxxxxxx/ relates.

Hi

So I guess we can tell the store like this -

When you've created your 'snapshot' of a thin-volume - this enforces full flush (& fsfreeze) of a thin volume - so any dirty pages need to written in thin pool before snapshot could be taken (and thin pool should not run out of space) - this CAN potentially hold your system running for a long time (depending on performance of your storage) and may cause various lock-ups states of your system if you are using this 'snapshoted' volume for anything else - as the volume is suspended - so it blocks further operations on this device  - eventually causing full system circular deadlock  (catch 22) - this is hard to analyze without whole picture of the system.

We may eventually think whether we can somehow minimize the amount of holding vglock and suspending with flush & fsfreeze -  but it's about some future possible enhancement and flush disk upfront to minimize dirty size.

I've forget to mention that a 'simplest' way is just to run 'sync' before running 'lvcreate -s...' command...

Thanks.  I think all in all everything mentioned here makes a lot of sense, and (in my opinion at least) explains the symptoms we've been seeing.

Overall the system does "feel" more responsive with the lower dirty buffers, and most likely it helps with data persistence (as has been mentioned) in case of system crashes and/or loss of power.

The tasks during peak usage also does seem to run faster on average, I suspect this is because of the use-case for this host:

1.  Data is seldomly overwritten (this was touched on).  Pretty much everything is WORM-type access (Write-Once, Read-Many). 2.  Caches are mostly needed to avoid read-bandwidth from consuming capacity for writing. 3.  It's thus beneficial to get writes out of the way as soon as possible, rather than at a later stage having to block getting many writes done for a flush() or sync() or lvcreate (snapshot).

Is 500MB needlessly low?  Probably.  But given the above I think this is acceptable.  Rather keep the disk writing *now* in order to free up *future* capacity.

I'm guessing your "simple way" is workable for the generic case as well, towards that end, is a relatively simple change to the lvm2 tools not perhaps to add an syncfs() call to lvcreate *just prior* to freezing?  The hard part is probably to figure out if the LV is mounted somewhere, and if it is, to open() that path in order to have a file-descriptor to pass to syncfs()?  Obviously if the LV isn't mounted none of this is a concern and we can just proceed.

What would be more interesting is if cluster-lvm is in play and the origin LV is active/open on an alternative node?  But that's well beyond the scope of our requirements (for now).

Kind regards,
Jaco





[Index of Archives]     [Gluster Users]     [Kernel Development]     [Linux Clusters]     [Device Mapper]     [Security]     [Bugtraq]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux RAID]

  Powered by Linux