Hi Felix. Could be a systemd bug prior to v247. Ceph code [1] points to this tracker [2]. Can you check if information in PR [3] helps you to get out of this trouble? Regards, Frédéric. [1] https://raw.githubusercontent.com/ceph/ceph/pacific/src/cephadm/cephadm [2] https://tracker.ceph.com/issues/50998 [3] https://github.com/ceph/ceph/pull/41829 ----- Le 25 Nov 24, à 11:53, Felix Stolte f.stolte@xxxxxxxxxxxxx a écrit : > Hi folks, > > we did upgrade one of our clusters from pacific to Quincy. Everything worked > fine, but cephadm complains about one osd not being upgraded: > > [WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.15 on host osd-dmz-k5-1 > failed. > Upgrade daemon: osd.15: cephadm exited with an error code: 1, stderr: Redeploy > daemon osd.15 ... > Failed to trim old cgroups > /sys/fs/cgroup/system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service > Non-zero exit code 1 from systemctl start > ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15 > systemctl: stderr Job for > ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service failed because the > control process exited with error code. > systemctl: stderr See "systemctl status > ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" and "journalctl -xeu > ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" for details. > Traceback (most recent call last): > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 9679, in <module> > main() > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 9667, in main > r = ctx.func(ctx) > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 2168, in _default_image > return func(ctx) > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 5992, in command_deploy > deploy_daemon(ctx, ctx.fsid, daemon_type, daemon_id, c, uid, gid, > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 3301, in deploy_daemon > deploy_daemon_units(ctx, fsid, uid, gid, daemon_type, daemon_id, > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 3558, in deploy_daemon_units > call_throws(ctx, ['systemctl', 'start', unit_name]) > File > "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b", > line 1806, in call_throws > raise RuntimeError(f'Failed command: {" ".join(command)}: {s}') > RuntimeError: Failed command: systemctl start > ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15: Job for > ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service failed because the > control process exited with error code. > See "systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" > and "journalctl -xeu ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" > for details. > > The osd in question seems to be running fine: > > systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service > ● ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service - Ceph osd.15 for > f852c3fc-05a0-11e8-bae7-77689751e5e7 > Loaded: loaded > (/etc/systemd/system/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@.service; > enabled; vendor preset: enabled) > Active: active (running) since Sat 2024-11-16 10:02:27 CET; 1 week 2 days ago > Main PID: 24583 (conmon) > Tasks: 67 (limit: 76281) > Memory: 6.0G > CPU: 9h 57min 20.017s > CGroup: > /system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service > ├─libpod-payload-3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e > │ ├─24586 /dev/init -- /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup > ceph --default-log-to-file=false --default-log-to-journald=true > --default-log-to-stderr=false > │ └─24588 /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup ceph > --default-log-to-file=false --default-log-to-journald=true > --default-log-to-stderr=false > └─supervisor > └─24583 /usr/bin/conmon --api-version 1 -c > 3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -u > 3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -r > /usr/bin/crun -b > /var/lib/containers/storage/overlay-containers/3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e/userdata > -p /run/containers/storage/over> > > Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time > 2024/11/25-10:23:14.904662) [db/memtable_list.cc:628] [default] Level-0 commit > table #794120: memtable #1 done > Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time > 2024/11/25-10:23:14.904710) EVENT_LOG_v1 {"time_micros": 1732530194904694, > "job": 1660, "event": "flush_finished", "output_compression": "NoCompression", > "lsm_state": [2, 1, 8, 44, 0, 0, 0], "immutable_memtables": 0} > Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time > 2024/11/25-10:23:14.904789) [db/db_impl/db_impl_compaction_flush.cc:233] > [default] Level summary: files[2 1 8 44 0 0 0] max score 0.78 > Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: > [db/db_impl/db_impl_files.cc:415] [JOB 1660] Try to delete WAL files size > 255924988, prev total WAL file size 256244157, number of live WAL files 2. > Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: > [file/delete_scheduler.cc:69] Deleted file db/794117.log immediately, > rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000 > Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time > 2024/11/25-10:23:14.905401) [db/db_impl/db_impl_compaction_flush.cc:2818] > Compaction nothing to do > Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: > [db/db_impl/db_impl.cc:901] ------- DUMPING STATS ------- > Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: > [db/db_impl/db_impl.cc:903] > ** DB Stats ** > Uptime(secs): 783001.8 total, 600.0 interval > Cumulative writes: 24M writes, 97M keys, 24M commit groups, 1.0 writes per > commit group, ingest: 119.22 GB, 0.16 MB/s > Cumulative WAL: 24M writes, 11M syncs, 2.03 writes per sync, written: 119.22 GB, > 0.16 MB/s > Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent > Interval writes: 17K writes, 61K keys, 17K commit groups, 1.0 writes per commit > group, ingest: 95.37 MB, 0.16 MB/s > Interval WAL: 17K writes, 8473 syncs, 2.01 writes per sync, written: 0.09 MB, > 0.16 MB/s > Interval stall: 00:00:0.000 H:M:S, 0.0 percent > > ** Compaction Stats [default] ** > Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) > Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) > Avg(sec) KeyIn KeyDrop > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > L0 2/0 9.86 MB 0.5 0.0 0.0 0.0 1.9 1.9 > 0.0 1.0 0.0 29.6 66.03 64.32 505 > 0.131 0 0 > L1 1/0 66.88 MB 0.7 4.6 1.9 2.7 3.3 0.7 > 0.0 1.8 71.1 52.1 65.56 60.72 126 > 0.520 100M 4156K > L2 8/0 450.76 MB 0.8 7.1 0.7 6.4 6.8 0.4 > 0.0 10.3 53.7 51.6 135.52 118.93 16 > 8.470 190M 1298K > L3 44/0 2.65 GB 0.1 0.7 0.3 0.4 0.4 -0.0 > 0.0 1.3 79.5 43.3 8.89 7.81 4 > 2.223 28M 17M > Sum 55/0 3.17 GB 0.0 12.3 2.9 9.5 12.4 3.0 > 0.0 6.5 45.8 46.2 276.01 251.78 651 > 0.424 318M 22M > Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 1.0 0.0 44.0 0.12 0.12 1 > 0.124 0 0 > > ** Compaction Stats [default] ** > Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) > Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) > Avg(sec) KeyIn KeyDrop > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Low 0/0 0.00 KB 0.0 12.3 2.9 9.5 10.5 1.0 > 0.0 0.0 60.2 51.4 209.98 187.46 146 > 1.438 318M 22M > High 0/0 0.00 KB 0.0 0.0 0.0 0.0 1.9 1.9 > 0.0 0.0 0.0 29.5 66.00 64.32 504 > 0.131 0 0 > User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 102.0 0.02 0.00 1 > 0.025 0 0 > Uptime(secs): 783001.9 total, 600.0 interval > Flush(GB): cumulative 1.906, interval 0.005 > AddFile(GB): cumulative 0.000, interval 0.000 > AddFile(Total Files): cumulative 0, interval 0 > AddFile(L0 Files): cumulative 0, interval 0 > AddFile(Keys): cumulative 0, interval 0 > Cumulative compaction: 12.45 GB write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s > read, 276.0 seconds > Interval compaction: 0.01 GB write, 0.01 MB/s write, 0.00 GB read, 0.00 MB/s > read, 0.1 seconds > Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0 > level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for > pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0 > memtable_compaction, 0 memtable_slowdown, interval 0 total count > > ** File Read Latency Histogram By Level [default] ** > > ** Compaction Stats [default] ** > Level Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) > Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) > Avg(sec) KeyIn KeyDrop > ---------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > L0 2/0 9.86 MB 0.5 0.0 0.0 0.0 1.9 1.9 > 0.0 1.0 0.0 29.6 66.03 64.32 505 > 0.131 0 0 > L1 1/0 66.88 MB 0.7 4.6 1.9 2.7 3.3 0.7 > 0.0 1.8 71.1 52.1 65.56 60.72 126 > 0.520 100M 4156K > L2 8/0 450.76 MB 0.8 7.1 0.7 6.4 6.8 0.4 > 0.0 10.3 53.7 51.6 135.52 118.93 16 > 8.470 190M 1298K > L3 44/0 2.65 GB 0.1 0.7 0.3 0.4 0.4 -0.0 > 0.0 1.3 79.5 43.3 8.89 7.81 4 > 2.223 28M 17M > Sum 55/0 3.17 GB 0.0 12.3 2.9 9.5 12.4 3.0 > 0.0 6.5 45.8 46.2 276.01 251.78 651 > 0.424 318M 22M > Int 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 0.0 0.00 0.00 0 > 0.000 0 0 > > ** Compaction Stats [default] ** > Priority Files Size Score Read(GB) Rn(GB) Rnp1(GB) Write(GB) Wnew(GB) > Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt) > Avg(sec) KeyIn KeyDrop > ------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- > Low 0/0 0.00 KB 0.0 12.3 2.9 9.5 10.5 1.0 > 0.0 0.0 60.2 51.4 209.98 187.46 146 > 1.438 318M 22M > High 0/0 0.00 KB 0.0 0.0 0.0 0.0 1.9 1.9 > 0.0 0.0 0.0 29.5 66.00 64.32 504 > 0.131 0 0 > User 0/0 0.00 KB 0.0 0.0 0.0 0.0 0.0 0.0 > 0.0 0.0 0.0 102.0 0.02 0.00 1 > 0.025 0 0 > Uptime(secs): 783001.9 total, 0.0 interval > Flush(GB): cumulative 1.906, interval 0.000 > AddFile(GB): cumulative 0.000, interval 0.000 > AddFile(Total Files): cumulative 0, interval 0 > AddFile(L0 Files): cumulative 0, interval 0 > AddFile(Keys): cumulative 0, interval 0 > Cumulative compaction: 12.45 GB write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s > read, 276.0 seconds > > > How do i fix this? We tried redeploying the osd but to no success. > > Best regards > Felix > > > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > Forschungszentrum Jülich GmbH > 52425 Jülich > Sitz der Gesellschaft: Jülich > Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498 > Vorsitzender des Aufsichtsrats: MinDir Stefan Müller > Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende), > Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens > ------------------------------------------------------------------------------------------------ > ------------------------------------------------------------------------------------------------ > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx