Re: lvm2 deadlock

Roger Heflin <rogerheflin@xxxxxxxxx> · Tue, 4 Jun 2024 08:30:13 -0500

My experience is that heavy disk io/batch disk io systems work better
with these values being smallish.

Ie both even under 10MB or so.    About all having the number larger
has done is trick io benchmarks that don't force a sync at the end,
and/or appear to make large saves happen faster.

There is also the freeze/pause for outstandingwritesMB/<iorateMB>
seconds, smaller shortens the freeze.

I don't see a use case for having large values.   It seems to have no
real upside and several downsides.  Get the buffer size small enough
and you will still get pauses to clear the writes the be pauses will
be short enough to not be a problem.

On Tue, Jun 4, 2024 at 6:52 AM Jaco Kroon <jaco@xxxxxxxxx> wrote:
>
> Hi,
>
> On 2024/06/04 12:48, Roger Heflin wrote:
>
> > Use the *_bytes values.  If they are non-zero then they are used and
> > that allows setting even below 1% (quite large on anything with a lot
> > of ram).
> >
> > I have been using this for quite a while:
> > vm.dirty_background_bytes = 3000000
> > vm.dirty_bytes = 5000000
>
>
> crowsnest [13:32:48] ~ # sysctl vm.dirty_background_bytes=3000000
> vm.dirty_background_bytes = 3000000
> crowsnest [13:32:59] ~ # sysctl vm.dirty_bytes=500000000
> vm.dirty_bytes = 500000000
>
> And persisted via /etc/sysctl.conf
>
> Thank you.  Must be noted this host doesn't do much else other than disk
> IO, so I'm hoping the 500MB value will be OK, this is just so IO won't
> block CPU heavy-at-the-time tasks.
>
> The purpose of 256GB RAM was so that we could have ~250GB worth of disk
> cache (obviously we don't want all of that to be dirty, OS and "used"
> used to be below 4GB, now generally around 8-12GB, currently it's in
> "quiet" time, so a bit lower, just busy running some background
> compression).  As per iostat:
>
> avg-cpu:  %user   %nice %system %iowait %steal   %idle
>             7.73   18.43   18.96   37.86    0.00   17.01
>
> Device             tps    MB_read/s    MB_wrtn/s    MB_dscd/s MB_read
> MB_wrtn    MB_dscd
> md2             392.13        10.00         5.11         0.00 4244888
> 2167644          0
> md3            2270.12        43.88        56.82         0.00 18626309
> 24120982          0
> md4            1406.06        30.47        16.83         0.00
> 12934654    7143330          0
>
> That's total 35805851 MB (34.1B) read and 33431956 MB (31.9TB) written
> in just under 5 days.
>
> What I am noticing immediately is that the "free" value as per "free -m"
> is definitely much higher, which to me is indicative that we're not
> caching as aggressively as can be done.  Will monitor this for the time
> being:
>
> crowsnest [13:50:09] ~ # free -m
>                 total        used        free      shared buff/cache
> available
> Mem:          257661        6911      105313           7 145436      248246
> Swap:              0           0           0
>
> The Total DISK WRITE and Current DISK Write values in in iotop seems to
> have a tighter correlation now (no longer seeing constant Total DISK
> WRITE with spikes in current, seems to be more even now).
>
> Kind regards,
> Jaco