Hi folks, we did upgrade one of our clusters from pacific to Quincy. Everything worked fine, but cephadm complains about one osd not being upgraded: [WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.15 on host osd-dmz-k5-1 failed. Upgrade daemon: osd.15: cephadm exited with an error code: 1, stderr: Redeploy daemon osd.15 ... Failed to trim old cgroups /sys/fs/cgroup/system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service Non-zero exit code 1 from systemctl start ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15 systemctl: stderr Job for ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service failed because the control process exited with error code. systemctl: stderr See "systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" and "journalctl -xeu ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" for details. Traceback (most recent call last): File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 9679, in <module> main() File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 9667, in main r = ctx.func(ctx) File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 2168, in _default_image return func(ctx) File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 5992, in command_deploy deploy_daemon(ctx, ctx.fsid, daemon_type, daemon_id, c, uid, gid, File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 3301, in deploy_daemon deploy_daemon_units(ctx, fsid, uid, gid, daemon_type, daemon_id, File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 3558, in deploy_daemon_units call_throws(ctx, ['systemctl', 'start', unit_name]) File "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", line 1806, in call_throws raise RuntimeError(f'Failed command: {" ".join(command)}: {s}') RuntimeError: Failed command: systemctl start ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15: Job for ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service failed because the control process exited with error code. See "systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" and "journalctl -xeu ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" for details. The osd in question seems to be running fine: systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service ● ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service - Ceph osd.15 for f852c3fc-05a0-11e8-bae7-77689751e5e7 Loaded: loaded (/etc/systemd/system/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@.service; enabled; vendor preset: enabled) Active: active (running) since Sat 2024-11-16 10:02:27 CET; 1 week 2 days ago Main PID: 24583 (conmon) Tasks: 67 (limit: 76281) Memory: 6.0G CPU: 9h 57min 20.017s CGroup: /system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service ├─libpod-payload-3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e │ ├─24586 /dev/init -- /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false │ └─24588 /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup ceph --default-log-to-file=false --default-log-to-journald=true --default-log-to-stderr=false └─supervisor └─24583 /usr/bin/conmon --api-version 1 -c 3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -u 3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -r /usr/bin/crun -b /var/lib/containers/storage/overlay-containers/3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e/userdata -p /run/containers/storage/over> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time 2024/11/25-10:23:14.904662) [db/memtable_list.cc:628] [default] Level-0 commit table #794120: memtable #1 done Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time 2024/11/25-10:23:14.904710) EVENT_LOG_v1 {"time_micros": 1732530194904694, "job": 1660, "event": "flush_finished", "output_compression": "NoCompression", "lsm_state": [2, 1, 8, 44, 0, 0, 0], "immutable_memtables": 0} Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time 2024/11/25-10:23:14.904789) [db/db_impl/db_impl_compaction_flush.cc:233] [default] Level summary: files[2 1 8 44 0 0 0] max score 0.78 Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: [db/db_impl/db_impl_files.cc:415] [JOB 1660] Try to delete WAL files size 255924988, prev total WAL file size 256244157, number of live WAL files 2. Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: [file/delete_scheduler.cc:69] Deleted file db/794117.log immediately, rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000 Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time 2024/11/25-10:23:14.905401) [db/db_impl/db_impl_compaction_flush.cc:2818] Compaction nothing to do Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: [db/db_impl/db_impl.cc:901] ------- DUMPING STATS ------- Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: [db/db_impl/db_impl.cc:903] ** DB Stats ** Uptime(secs): 783001.8 total, 600.0 interval Cumulative writes: 24M writes, 97M keys, 24M commit groups, 1.0 writes per commit group, ingest: 119.22 GB, 0.16 MB/s Cumulative WAL: 24M writes, 11M syncs, 2.03 writes per sync, written: 119.22 GB, 0.16 MB/s Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent Interval writes: 17K writes, 61K keys, 17K commit groups, 1.0 writes per commit group, ingest: 95.37 MB, 0.16 MB/s Interval WAL: 17K writes, 8473 syncs, 2.01 writes per sync, written: 0.09 MB, 0.16 MB/s Interval stall: 00:00:0.000 H:M:S, 0.0 percent ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 2/0 9.86 MB 0.5 0.0 0.0 0.0 1.9 1.9 0.0 1.0 0.0 29.6 66.03 64.32 505 0.131 0 0 L1 1/0 66.88 MB 0.7 4.6 1.9 2.7 3.3 0.7 0.0 1.8 71.1 52.1 65.56 60.72 126 0.520 100M 4156K L2 8/0 450.76 MB 0.8 7.1 0.7 6.4 6.8 0.4 0.0 10.3 53.7 51.6 135.52 118.93 16 8.470 190M 1298K L3 44/0 2.65 GB 0.1 0.7 0.3 0.4 0.4 -0.0 0.0 1.3 79.5 43.3 8.89 7.81 4 2.223 28M 17M Sum 55/0 3.17 GB 0.0 12.3 2.9 9.5 12.4 3.0 0.0 6.5 45.8 46.2 276.01 251.78 651 0.424 318M 22M Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 1.0 0.0 44.0 0.12 0.12 1 0.124 0 0 ** Compaction Stats [default] ** Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Low 0/0 0.00 KB 0.0 12.3 2.9 9.5 10.5 1.0 0.0 0.0 60.2 51.4 209.98 187.46 146 1.438 318M 22M High 0/0 0.00 KB 0.0 0.0 0.0 0.0 1.9 1.9 0.0 0.0 0.0 29.5 66.00 64.32 504 0.131 0 0 User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 102.0 0.02 0.00 1 0.025 0 0 Uptime(secs): 783001.9 total, 600.0 interval Flush(GB): cumulative 1.906, interval 0.005 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 12.45 GB write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s read, 276.0 seconds Interval compaction: 0.01 GB write, 0.01 MB/s write, 0.00 GB read, 0.00 MB/s read, 0.1 seconds Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 memtable_compaction, 0 memtable_slowdown, interval 0 total count ** File Read Latency Histogram By Level [default] ** ** Compaction Stats [default] ** Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- L0 2/0 9.86 MB 0.5 0.0 0.0 0.0 1.9 1.9 0.0 1.0 0.0 29.6 66.03 64.32 505 0.131 0 0 L1 1/0 66.88 MB 0.7 4.6 1.9 2.7 3.3 0.7 0.0 1.8 71.1 52.1 65.56 60.72 126 0.520 100M 4156K L2 8/0 450.76 MB 0.8 7.1 0.7 6.4 6.8 0.4 0.0 10.3 53.7 51.6 135.52 118.93 16 8.470 190M 1298K L3 44/0 2.65 GB 0.1 0.7 0.3 0.4 0.4 -0.0 0.0 1.3 79.5 43.3 8.89 7.81 4 2.223 28M 17M Sum 55/0 3.17 GB 0.0 12.3 2.9 9.5 12.4 3.0 0.0 6.5 45.8 46.2 276.01 251.78 651 0.424 318M 22M Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.00 0.00 0 0.000 0 0 ** Compaction Stats [default] ** Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) Avg(sec) KeyIn KeyDrop ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- Low 0/0 0.00 KB 0.0 12.3 2.9 9.5 10.5 1.0 0.0 0.0 60.2 51.4 209.98 187.46 146 1.438 318M 22M High 0/0 0.00 KB 0.0 0.0 0.0 0.0 1.9 1.9 0.0 0.0 0.0 29.5 66.00 64.32 504 0.131 0 0 User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 0.0 102.0 0.02 0.00 1 0.025 0 0 Uptime(secs): 783001.9 total, 0.0 interval Flush(GB): cumulative 1.906, interval 0.000 AddFile(GB): cumulative 0.000, interval 0.000 AddFile(Total Files): cumulative 0, interval 0 AddFile(L0 Files): cumulative 0, interval 0 AddFile(Keys): cumulative 0, interval 0 Cumulative compaction: 12.45 GB write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s read, 276.0 seconds How do i fix this? We tried redeploying the osd but to no success. Best regards Felix ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ Forschungszentrum Jülich GmbH 52425 Jülich Sitz der Gesellschaft: Jülich Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 Vorsitzender des Aufsichtsrats: MinDir Stefan Müller Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens ------------------------------------------------------------------------------------------------ ------------------------------------------------------------------------------------------------ _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx