I'm seeing a similar problem on a small cluster just upgraded from
Pacific 16.2.9 to Quincy 17.2.1 (non-cephadm). The cluster was only very
lightly loaded during and after the upgrade. The OSDs affected are all
bluestore, HDD sharing NVMe DB/WAL, and all created on Pacific (I think).
The upgrade was uneventful except that after the OSDs on the first node
were restarted, some rebalancing started. I don't know why, but I set
norebalance for the remainder of the upgrade.
When the entire upgrade was complete I unset norebalance. A number of
things then became apparent:
* Startup appeared normal, with health returning to OK in the expected
time. Apart from below, things seem normal.
* A few OSDs are consuming around 95% CPU, almost all of which is User
mode. System, iowait and interrupt are all minimal.
* Rebalancing is occurring, but only at about 1 object/second.
* Disk IO rates are minimal.
* We originally had osd_compact_on_start set to true, but then
disabled it, and restarted the CPU-bound OSDs. They all restarted
ok, and then resumed their high CPU load. Rebooting an OSD node
didn't change anything either.
* There are currently 3 PGs that are remapping (all cephfs_data). The
UP OSDs of each of those PGs (5, 7 & 8) are the ones consuming the CPU.
* Setting nobackfill causes the CPU consumption to drop to normal idle
levels; unsetting it returns to the high levels.
* We have not changed any other settings during the upgrade. All of
the OSDs are using mclock high_client_ops profile.
On Pacific, the rebalancing was often up to 100 objects per second with
fairly minimal CPU load.
This CPU load and backfill rate is not sustainable, and I have put the
upgrade of our larger production cluster on hold.
Any ideas, please?!
Thanks, Chris
# ceph pg dump pgs_brief|grep remap
dumped pgs_brief
8.38 active+remapped+backfilling [7,0,4] 7
[7,0,3] 7
8.5f active+remapped+backfilling [8,3,2] 8
[8,3,0] 8
8.63 active+remapped+backfilling [0,6,5] 0
[5,6,2] 5
# ceph -s
cluster:
id: 9208361c-5b68-41ed-8155-cc246a3fe538
health: HEALTH_OK
services:
mon: 3 daemons, quorum pve1,pve2,pve3 (age 46m)
mgr: pve1(active, since 2h), standbys: pve2, pve3
mds: 1/1 daemons up, 2 standby
osd: 18 osds: 18 up (since 45m), 18 in (since 2w); 3 remapped pgs
rgw: 3 daemons active (3 hosts, 1 zones)
data:
volumes: 1/1 healthy
pools: 14 pools, 617 pgs
objects: 2.49M objects, 5.9 TiB
usage: 18 TiB used, 25 TiB / 43 TiB avail
pgs: 16826/7484589 objects misplaced (0.225%)
614 active+clean
3 active+remapped+backfilling
io:
client: 341 B/s rd, 256 KiB/s wr, 0 op/s rd, 20 op/s wr
recovery: 4.8 MiB/s, 1 objects/s
On 06/07/2022 16:19, Jimmy Spets wrote:
Do you mean load average as reported by `top` or `uptime`?
yes
That figure can be misleading on multi-core systems. What CPU are you
using?
It's a 4c/4t low end CPU
/Jimmy
On Wed, Jul 6, 2022 at 4:52 PM Anthony D'Atri<anthony.datri@xxxxxxxxx>
wrote:
Do you mean load average as reported by `top` or `uptime`?
That figure can be misleading on multi-core systems. What CPU are you
using?
For context, when I ran systems with 32C/64T and 24x SATA SSD, the load
average could easily hit 40-60 without anything being wrong.
What CPU percentages in user, system, idle, iowait do you see?
On Jul 6, 2022, at 5:32 AM, Jimmy Spets<jimmy@xxxxxxxxx> wrote:
Hi all
I have a 10 node cluster with fairly modest hardware (6 HDD, 1 shared
NVME for DB on each) on the nodes that I use for archival.
After upgrading to Quincy I noticed that load avg on my servers is very
high during recovery or rebalance.
Changing the OSD recovery priority does not work, I assume because of
the change to mClock.
Is the high load avg the expected behaviour?
Should I adjust some limits so that the scheduler does not overwhelm the
server?
/Jimmy
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list --ceph-users@xxxxxxx
To unsubscribe send an email toceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx