Hi,
On 2024/06/04 15:30, Roger Heflin wrote:
My experience is that heavy disk io/batch disk io systems work better
with these values being smallish.
Ie both even under 10MB or so. About all having the number larger
has done is trick io benchmarks that don't force a sync at the end,
and/or appear to make large saves happen faster.
There is also the freeze/pause for outstandingwritesMB/<iorateMB>
seconds, smaller shortens the freeze.
I don't see a use case for having large values. It seems to have no
real upside and several downsides. Get the buffer size small enough
and you will still get pauses to clear the writes the be pauses will
be short enough to not be a problem.
Thanks, this is extremely insightful. So with original values there
could be "up to" ~ 50GB outstanding for write, let's assume that's all
to one disk (extremely unlikely, and assuming 100MB/s which is
optimistic if it's random access) this will take upwards of 500 seconds,
which is a hellishly long time in our world.
I think the value of 500MB I've set now should almost never exceed 10s
or so for a sync even if everything is targeted at a single drive. I
think we're OK with that on this specific host.
Kind regards,
Jaco
On Tue, Jun 4, 2024 at 6:52 AM Jaco Kroon <jaco@xxxxxxxxx> wrote:
Hi,
On 2024/06/04 12:48, Roger Heflin wrote:
Use the *_bytes values. If they are non-zero then they are used and
that allows setting even below 1% (quite large on anything with a lot
of ram).
I have been using this for quite a while:
vm.dirty_background_bytes = 3000000
vm.dirty_bytes = 5000000
crowsnest [13:32:48] ~ # sysctl vm.dirty_background_bytes=3000000
vm.dirty_background_bytes = 3000000
crowsnest [13:32:59] ~ # sysctl vm.dirty_bytes=500000000
vm.dirty_bytes = 500000000
And persisted via /etc/sysctl.conf
Thank you. Must be noted this host doesn't do much else other than disk
IO, so I'm hoping the 500MB value will be OK, this is just so IO won't
block CPU heavy-at-the-time tasks.
The purpose of 256GB RAM was so that we could have ~250GB worth of disk
cache (obviously we don't want all of that to be dirty, OS and "used"
used to be below 4GB, now generally around 8-12GB, currently it's in
"quiet" time, so a bit lower, just busy running some background
compression). As per iostat:
avg-cpu: %user %nice %system %iowait %steal %idle
7.73 18.43 18.96 37.86 0.00 17.01
Device tps MB_read/s MB_wrtn/s MB_dscd/s MB_read
MB_wrtn MB_dscd
md2 392.13 10.00 5.11 0.00 4244888
2167644 0
md3 2270.12 43.88 56.82 0.00 18626309
24120982 0
md4 1406.06 30.47 16.83 0.00
12934654 7143330 0
That's total 35805851 MB (34.1B) read and 33431956 MB (31.9TB) written
in just under 5 days.
What I am noticing immediately is that the "free" value as per "free -m"
is definitely much higher, which to me is indicative that we're not
caching as aggressively as can be done. Will monitor this for the time
being:
crowsnest [13:50:09] ~ # free -m
total used free shared buff/cache
available
Mem: 257661 6911 105313 7 145436 248246
Swap: 0 0 0
The Total DISK WRITE and Current DISK Write values in in iotop seems to
have a tighter correlation now (no longer seeing constant Total DISK
WRITE with spikes in current, seems to be more even now).
Kind regards,
Jaco