Re: UPGRADE_REDEPLOY_DAEMON: Upgrading daemon failed

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Felix.

Could be a systemd bug prior to v247. Ceph code [1] points to this tracker [2].

Can you check if information in PR [3] helps you to get out of this trouble?

Regards,
Frédéric.

[1] https://raw.githubusercontent.com/ceph/ceph/pacific/src/cephadm/cephadm
[2] https://tracker.ceph.com/issues/50998
[3] https://github.com/ceph/ceph/pull/41829

----- Le 25 Nov 24, à 11:53, Felix Stolte f.stolte@xxxxxxxxxxxxx a écrit :

> Hi folks,
> 
> we did upgrade one of our clusters from pacific to Quincy. Everything worked
> fine, but cephadm complains about one osd not being upgraded:
> 
> [WRN] UPGRADE_REDEPLOY_DAEMON: Upgrading daemon osd.15 on host osd-dmz-k5-1
> failed.
>    Upgrade daemon: osd.15: cephadm exited with an error code: 1, stderr: Redeploy
>    daemon osd.15 ...
> Failed to trim old cgroups
> /sys/fs/cgroup/system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service
> Non-zero exit code 1 from systemctl start
> ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15
> systemctl: stderr Job for
> ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service failed because the
> control process exited with error code.
> systemctl: stderr See "systemctl status
> ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" and "journalctl -xeu
> ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service" for details.
> Traceback (most recent call last):
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 9679, in <module>
>    main()
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 9667, in main
>    r = ctx.func(ctx)
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 2168, in _default_image
>    return func(ctx)
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 5992, in command_deploy
>    deploy_daemon(ctx, ctx.fsid, daemon_type, daemon_id, c, uid, gid,
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 3301, in deploy_daemon
>    deploy_daemon_units(ctx, fsid, uid, gid, daemon_type, daemon_id,
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 3558, in deploy_daemon_units
>    call_throws(ctx, ['systemctl', 'start', unit_name])
>  File
>  "/var/lib/ceph/f852c3fc-05a0-11e8-bae7-77689751e5e7/cephadm.8b92cafd937eb89681ee011f9e70f85937fd09c4bd61ed4a59981d275a1f255b",
>  line 1806, in call_throws
>    raise RuntimeError(f'Failed command: {" ".join(command)}: {s}')
> RuntimeError: Failed command: systemctl start
> ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15: Job for
> ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service failed because the
> control process exited with error code.
> See "systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service"
> and "journalctl -xeu ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service"
> for details.
> 
> The osd in question seems to be running fine:
> 
> systemctl status ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service
> ● ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service - Ceph osd.15 for
> f852c3fc-05a0-11e8-bae7-77689751e5e7
>     Loaded: loaded
>     (/etc/systemd/system/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@.service;
>     enabled; vendor preset: enabled)
>     Active: active (running) since Sat 2024-11-16 10:02:27 CET; 1 week 2 days ago
>   Main PID: 24583 (conmon)
>      Tasks: 67 (limit: 76281)
>     Memory: 6.0G
>        CPU: 9h 57min 20.017s
>     CGroup:
>     /system.slice/system-ceph\x2df852c3fc\x2d05a0\x2d11e8\x2dbae7\x2d77689751e5e7.slice/ceph-f852c3fc-05a0-11e8-bae7-77689751e5e7@osd.15.service
>             ├─libpod-payload-3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e
>             │ ├─24586 /dev/init -- /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup
>             ceph --default-log-to-file=false --default-log-to-journald=true
>             --default-log-to-stderr=false
>             │ └─24588 /usr/bin/ceph-osd -n osd.15 -f --setuser ceph --setgroup ceph
>             --default-log-to-file=false --default-log-to-journald=true
>             --default-log-to-stderr=false
>             └─supervisor
>               └─24583 /usr/bin/conmon --api-version 1 -c
>               3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -u
>               3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e -r
>               /usr/bin/crun -b
>               /var/lib/containers/storage/overlay-containers/3e6ba1f01ad8ca4c20c08a9984bdd983f43b9a15a0ec1b452b4d17c9f5ef519e/userdata
>               -p /run/containers/storage/over>
> 
> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
> 2024/11/25-10:23:14.904662) [db/memtable_list.cc:628] [default] Level-0 commit
> table #794120: memtable #1 done
> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
> 2024/11/25-10:23:14.904710) EVENT_LOG_v1 {"time_micros": 1732530194904694,
> "job": 1660, "event": "flush_finished", "output_compression": "NoCompression",
> "lsm_state": [2, 1, 8, 44, 0, 0, 0], "immutable_memtables": 0}
> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
> 2024/11/25-10:23:14.904789) [db/db_impl/db_impl_compaction_flush.cc:233]
> [default] Level summary: files[2 1 8 44 0 0 0] max score 0.78
> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
> [db/db_impl/db_impl_files.cc:415] [JOB 1660] Try to delete WAL files size
> 255924988, prev total WAL file size 256244157, number of live WAL files 2.
> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
> [file/delete_scheduler.cc:69] Deleted file db/794117.log immediately,
> rate_bytes_per_sec 0, total_trash_size 0 max_trash_db_ratio 0.250000
> Nov 25 11:23:14 osd-dmz-k5-1 ceph-osd[24588]: rocksdb: (Original Log Time
> 2024/11/25-10:23:14.905401) [db/db_impl/db_impl_compaction_flush.cc:2818]
> Compaction nothing to do
> Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
> [db/db_impl/db_impl.cc:901] ------- DUMPING STATS -------
> Nov 25 11:32:34 osd-dmz-k5-1 ceph-osd[24588]: rocksdb:
> [db/db_impl/db_impl.cc:903]
>                                              ** DB Stats **
>                                              Uptime(secs): 783001.8 total, 600.0 interval
>                                              Cumulative writes: 24M writes, 97M keys, 24M commit groups, 1.0 writes per
>                                              commit group, ingest: 119.22 GB, 0.16 MB/s
>                                              Cumulative WAL: 24M writes, 11M syncs, 2.03 writes per sync, written: 119.22 GB,
>                                              0.16 MB/s
>                                              Cumulative stall: 00:00:0.000 H:M:S, 0.0 percent
>                                              Interval writes: 17K writes, 61K keys, 17K commit groups, 1.0 writes per commit
>                                              group, ingest: 95.37 MB, 0.16 MB/s
>                                              Interval WAL: 17K writes, 8473 syncs, 2.01 writes per sync, written: 0.09 MB,
>                                              0.16 MB/s
>                                              Interval stall: 00:00:0.000 H:M:S, 0.0 percent
> 
>                                              ** Compaction Stats [default] **
>                                              Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB)
>                                              Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt)
>                                              Avg(sec) KeyIn KeyDrop
>                                              ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                                                L0      2/0    9.86 MB   0.5      0.0     0.0      0.0       1.9      1.9
>                                                0.0   1.0      0.0     29.6     66.03             64.32       505
>                                                0.131       0      0
>                                                L1      1/0   66.88 MB   0.7      4.6     1.9      2.7       3.3      0.7
>                                                0.0   1.8     71.1     52.1     65.56             60.72       126
>                                                0.520    100M  4156K
>                                                L2      8/0   450.76 MB   0.8      7.1     0.7      6.4       6.8      0.4
>                                                0.0  10.3     53.7     51.6    135.52            118.93        16
>                                                8.470    190M  1298K
>                                                L3     44/0    2.65 GB   0.1      0.7     0.3      0.4       0.4     -0.0
>                                                0.0   1.3     79.5     43.3      8.89              7.81         4
>                                                2.223     28M    17M
>                                               Sum     55/0    3.17 GB   0.0     12.3     2.9      9.5      12.4      3.0
>                                               0.0   6.5     45.8     46.2    276.01            251.78       651
>                                               0.424    318M    22M
>                                               Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0
>                                               0.0   1.0      0.0     44.0      0.12              0.12         1
>                                               0.124       0      0
> 
>                                              ** Compaction Stats [default] **
>                                              Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB)
>                                              Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt)
>                                              Avg(sec) KeyIn KeyDrop
>                                              -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                                               Low      0/0    0.00 KB   0.0     12.3     2.9      9.5      10.5      1.0
>                                               0.0   0.0     60.2     51.4    209.98            187.46       146
>                                               1.438    318M    22M
>                                              High      0/0    0.00 KB   0.0      0.0     0.0      0.0       1.9      1.9
>                                              0.0   0.0      0.0     29.5     66.00             64.32       504
>                                              0.131       0      0
>                                              User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0
>                                              0.0   0.0      0.0    102.0      0.02              0.00         1
>                                              0.025       0      0
>                                              Uptime(secs): 783001.9 total, 600.0 interval
>                                              Flush(GB): cumulative 1.906, interval 0.005
>                                              AddFile(GB): cumulative 0.000, interval 0.000
>                                              AddFile(Total Files): cumulative 0, interval 0
>                                              AddFile(L0 Files): cumulative 0, interval 0
>                                              AddFile(Keys): cumulative 0, interval 0
>                                              Cumulative compaction: 12.45 GB write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s
>                                              read, 276.0 seconds
>                                              Interval compaction: 0.01 GB write, 0.01 MB/s write, 0.00 GB read, 0.00 MB/s
>                                              read, 0.1 seconds
>                                              Stalls(count): 0 level0_slowdown, 0 level0_slowdown_with_compaction, 0
>                                              level0_numfiles, 0 level0_numfiles_with_compaction, 0 stop for
>                                              pending_compaction_bytes, 0 slowdown for pending_compaction_bytes, 0
>                                              memtable_compaction, 0 memtable_slowdown, interval 0 total count
> 
>                                              ** File Read Latency Histogram By Level [default] **
> 
>                                              ** Compaction Stats [default] **
>                                              Level    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB)
>                                              Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt)
>                                              Avg(sec) KeyIn KeyDrop
>                                              ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                                                L0      2/0    9.86 MB   0.5      0.0     0.0      0.0       1.9      1.9
>                                                0.0   1.0      0.0     29.6     66.03             64.32       505
>                                                0.131       0      0
>                                                L1      1/0   66.88 MB   0.7      4.6     1.9      2.7       3.3      0.7
>                                                0.0   1.8     71.1     52.1     65.56             60.72       126
>                                                0.520    100M  4156K
>                                                L2      8/0   450.76 MB   0.8      7.1     0.7      6.4       6.8      0.4
>                                                0.0  10.3     53.7     51.6    135.52            118.93        16
>                                                8.470    190M  1298K
>                                                L3     44/0    2.65 GB   0.1      0.7     0.3      0.4       0.4     -0.0
>                                                0.0   1.3     79.5     43.3      8.89              7.81         4
>                                                2.223     28M    17M
>                                               Sum     55/0    3.17 GB   0.0     12.3     2.9      9.5      12.4      3.0
>                                               0.0   6.5     45.8     46.2    276.01            251.78       651
>                                               0.424    318M    22M
>                                               Int      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0
>                                               0.0   0.0      0.0      0.0      0.00              0.00         0
>                                               0.000       0      0
> 
>                                              ** Compaction Stats [default] **
>                                              Priority    Files   Size     Score Read(GB)  Rn(GB) Rnp1(GB) Write(GB) Wnew(GB)
>                                              Moved(GB) W-Amp Rd(MB/s) Wr(MB/s) Comp(sec) CompMergeCPU(sec) Comp(cnt)
>                                              Avg(sec) KeyIn KeyDrop
>                                              -------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
>                                               Low      0/0    0.00 KB   0.0     12.3     2.9      9.5      10.5      1.0
>                                               0.0   0.0     60.2     51.4    209.98            187.46       146
>                                               1.438    318M    22M
>                                              High      0/0    0.00 KB   0.0      0.0     0.0      0.0       1.9      1.9
>                                              0.0   0.0      0.0     29.5     66.00             64.32       504
>                                              0.131       0      0
>                                              User      0/0    0.00 KB   0.0      0.0     0.0      0.0       0.0      0.0
>                                              0.0   0.0      0.0    102.0      0.02              0.00         1
>                                              0.025       0      0
>                                              Uptime(secs): 783001.9 total, 0.0 interval
>                                              Flush(GB): cumulative 1.906, interval 0.000
>                                              AddFile(GB): cumulative 0.000, interval 0.000
>                                              AddFile(Total Files): cumulative 0, interval 0
>                                              AddFile(L0 Files): cumulative 0, interval 0
>                                              AddFile(Keys): cumulative 0, interval 0
>                                              Cumulative compaction: 12.45 GB write, 0.02 MB/s write, 12.35 GB read, 0.02 MB/s
>                                              read, 276.0 seconds
> 
> 
> How do i fix this? We tried redeploying the osd but to no success.
> 
> Best regards
> Felix
> 
> 
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> Forschungszentrum Jülich GmbH
> 52425 Jülich
> Sitz der Gesellschaft: Jülich
> Eingetragen im Handelsregister des Amtsgerichts Düren Nr. HR B 3498
> Vorsitzender des Aufsichtsrats: MinDir Stefan Müller
> Geschäftsführung: Prof. Dr. Astrid Lambrecht (Vorsitzende),
> Karsten Beneke (stellv. Vorsitzender), Prof. Dr. Ir. Pieter Jansens
> ------------------------------------------------------------------------------------------------
> ------------------------------------------------------------------------------------------------
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux