Re: Ceph - Error ERANGE: (34) Numerical result out of range

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



So this is a new host (you didn't provide the osd tree)? In that case I would compare the ceph.conf files between a working and this failing host, and paste it here (mask sensitive data). It looks like the connection to the MONs is successful though, and "ceph-volume create" worked as well. You could try to avoid a crush update on start:

[osd]
osd crush update on start = false

Or you could also try to manually assign the location:

[osd.301]
osd crush location = "root=ssd"

Try one option at a time to see which one works (if at all).


Zitat von Pardhiv Karri <meher4india@xxxxxxxxx>:

Hi Eugen,

Thank you for the reply. For some reason I'm not getting individual reply
but only the digest. Below is the ceph -s output (renamed hostnames) and
the command I am using to create a bluestore OSD. It should create a OSD
with its hostname and then the OSD should be up but it is not creating the
host and just a rogue OSD which is down.

[root@hbmon1 ~]# ceph -s
  cluster:
    id:     f1579737-d2c9-49ab-a6fa-8ca952488120
    health: HEALTH_WARN
            116896/167701779 objects misplaced (0.070%)

  services:
    mon: 3 daemons, quorum hbmon1,hbmon2,hbmon3
    mgr: hbmon2(active), standbys: hbmon1, hbmon3
    osd: 721 osds: 717 up, 716 in; 60 remapped pgs
    rgw: 1 daemon active

  data:
    pools:   13 pools, 32384 pgs
    objects: 55.90M objects, 324TiB
    usage:   973TiB used, 331TiB / 1.27PiB avail
    pgs:     116896/167701779 objects misplaced (0.070%)
             32294 active+clean
             59    active+remapped+backfill_wait
             27    active+clean+scrubbing+deep
             3     active+clean+scrubbing
             1     active+remapped+backfilling

  io:
    client:   237MiB/s rd, 635MiB/s wr, 10.66kop/s rd, 6.98kop/s wr
    recovery: 12.9MiB/s, 1objects/s

 [root@hbmon1 ~]#


Command used to create OSD, "ceph-volume lvm create --data /dev/sda"



Debug log output of OSD creation command.

 [root@dra1361 ~]# ceph-volume lvm create --data /dev/sda
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: /usr/bin/ceph --cluster ceph --name client.bootstrap-osd
--keyring /var/lib/ceph/bootstrap-osd/ceph.keyring -i - osd new
21e9a327-ada5-4734-ab5d-7be333d4f3cf
Running command: vgcreate --force --yes
ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef /dev/sda
 stdout: Physical volume "/dev/sda" successfully created.
 stdout: Volume group "ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef"
successfully created
Running command: lvcreate --yes -l 100%FREE -n
osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef
 stdout: Logical volume "osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf"
created.
Running command: /usr/bin/ceph-authtool --gen-print-key
Running command: mount -t tmpfs tmpfs /var/lib/ceph/osd/ceph-301
--> Absolute path not found for executable: restorecon
--> Ensure $PATH environment variable contains common executable locations
Running command: chown -h ceph:ceph
/dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
Running command: chown -R ceph:ceph /dev/dm-0
Running command: ln -s
/dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
/var/lib/ceph/osd/ceph-301/block
Running command: ceph --cluster ceph --name client.bootstrap-osd --keyring
/var/lib/ceph/bootstrap-osd/ceph.keyring mon getmap -o
/var/lib/ceph/osd/ceph-301/activate.monmap
 stderr: 2023-10-27 19:48:57.789631 7ff36a340700  2 Event(0x7ff3640e2950
nevent=5000 time_id=1).set_owner idx=0 owner=140683435575040
2023-10-27 19:48:57.789713 7ff369b3f700  2 Event(0x7ff36410f670 nevent=5000
time_id=1).set_owner idx=1 owner=140683427182336
2023-10-27 19:48:57.789771 7ff36933e700  2 Event(0x7ff36413c4e0 nevent=5000
time_id=1).set_owner idx=2 owner=140683418789632
 stderr: 2023-10-27 19:48:57.790044 7ff36c135700  1  Processor -- start
2023-10-27 19:48:57.790100 7ff36c135700  1 -- - start start
2023-10-27 19:48:57.790352 7ff36c135700  1 -- - --> 10.51.228.32:6789/0 --
auth(proto 0 38 bytes epoch 0) v1 -- 0x7ff364175e70 con 0
2023-10-27 19:48:57.790368 7ff36c135700  1 -- - --> 10.51.228.33:6789/0 --
auth(proto 0 38 bytes epoch 0) v1 -- 0x7ff3641762b0 con 0
 stderr: 2023-10-27 19:48:57.791313 7ff369b3f700  1 --
10.51.228.213:0/2678799534 learned_addr learned my addr
10.51.228.213:0/2678799534
 stderr: 2023-10-27 19:48:57.791740 7ff36933e700  2 --
10.51.228.213:0/2678799534 >> 10.51.228.32:6789/0 conn(0x7ff36417f4e0 :-1
s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
newly_acked_seq 0 vs out_seq 0
2023-10-27 19:48:57.791763 7ff369b3f700  2 -- 10.51.228.213:0/2678799534 >>
10.51.228.33:6789/0 conn(0x7ff36417be80 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
 stderr: 2023-10-27 19:48:57.792414 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 1 ==== mon_map
magic: 0 v1 ==== 442+0+0 (171445244 0 0) 0x7ff360001690 con 0x7ff36417be80
2023-10-27 19:48:57.792544 7ff353fff700  1 -- 10.51.228.213:0/2678799534
<== mon.1 10.51.228.33:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1
==== 33+0+0 (2209822748 0 0) 0x7ff3641762b0 con 0x7ff36417be80
2023-10-27 19:48:57.792686 7ff353fff700  1 -- 10.51.228.213:0/2678799534
--> 10.51.228.33:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 --
0x7ff34c001880 con 0
2023-10-27 19:48:57.792722 7ff353fff700  1 -- 10.51.228.213:0/2678799534
<== mon.0 10.51.228.32:6789/0 1 ==== mon_map magic: 0 v1 ==== 442+0+0
(171445244 0 0) 0x7ff354001710 con 0x7ff36417f4e0
 stderr: 2023-10-27 19:48:57.792776 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.0 10.51.228.32:6789/0 2 ====
auth_reply(proto 2 0 (0) Success) v1 ==== 33+0+0 (782387931 0 0)
0x7ff354001c10 con 0x7ff36417f4e0
2023-10-27 19:48:57.792832 7ff353fff700  1 -- 10.51.228.213:0/2678799534
--> 10.51.228.32:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 --
0x7ff34c0035b0 con 0
 stderr: 2023-10-27 19:48:57.793541 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 3 ====
auth_reply(proto 2 0 (0) Success) v1 ==== 222+0+0 (713751175 0 0)
0x7ff3600022d0 con 0x7ff36417be80
2023-10-27 19:48:57.793740 7ff353fff700  1 -- 10.51.228.213:0/2678799534
--> 10.51.228.33:6789/0 -- auth(proto 2 181 bytes epoch 0) v1 --
0x7ff34c0024d0 con 0
2023-10-27 19:48:57.793774 7ff353fff700  1 -- 10.51.228.213:0/2678799534
<== mon.0 10.51.228.32:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1
==== 222+0+0 (1047601458 0 0) 0x7ff3540022d0 con 0x7ff36417f4e0
 stderr: 2023-10-27 19:48:57.793868 7ff353fff700  1 --
10.51.228.213:0/2678799534 --> 10.51.228.32:6789/0 -- auth(proto 2 181
bytes epoch 0) v1 -- 0x7ff34c005f10 con 0
 stderr: 2023-10-27 19:48:57.794682 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 4 ====
auth_reply(proto 2 0 (0) Success) v1 ==== 612+0+0 (1210392221 0 0)
0x7ff360002cd0 con 0x7ff36417be80
 stderr: 2023-10-27 19:48:57.794875 7ff353fff700  1 --
10.51.228.213:0/2678799534 >> 10.51.228.32:6789/0 conn(0x7ff36417f4e0 :-1
s=STATE_OPEN pgs=316443717 cs=1 l=1).mark_down
2023-10-27 19:48:57.794897 7ff353fff700  2 -- 10.51.228.213:0/2678799534 >>
10.51.228.32:6789/0 conn(0x7ff36417f4e0 :-1 s=STATE_OPEN pgs=316443717 cs=1
l=1)._stop
2023-10-27 19:48:57.794955 7ff353fff700  1 -- 10.51.228.213:0/2678799534
--> 10.51.228.33:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x7ff364176570
con 0
2023-10-27 19:48:57.795071 7ff36c135700  1 -- 10.51.228.213:0/2678799534
--> 10.51.228.33:6789/0 -- mon_subscribe({mgrmap=0+}) v2 -- 0x7ff364176b70
con 0
 stderr: 2023-10-27 19:48:57.795186 7ff36c135700  1 --
10.51.228.213:0/2678799534 --> 10.51.228.33:6789/0 --
mon_subscribe({osdmap=0}) v2 -- 0x7ff3641843c0 con 0
 stderr: 2023-10-27 19:48:57.795919 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 5 ==== mon_map
magic: 0 v1 ==== 442+0+0 (171445244 0 0) 0x7ff3600032f0 con 0x7ff36417be80
2023-10-27 19:48:57.796020 7ff353fff700  1 -- 10.51.228.213:0/2678799534
<== mon.1 10.51.228.33:6789/0 6 ==== mgrmap(e 255) v1 ==== 580+0+0
(3748818868 0 0) 0x7ff3600037e0 con 0x7ff36417be80
 stderr: 2023-10-27 19:48:57.797732 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 7 ====
osd_map(1497788..1497788 src has 1496148..1497788) v3 ==== 383089+0+0
(4062048124 0 0) 0x7ff34c0024d0 con 0x7ff36417be80
 stderr: 2023-10-27 19:48:57.797968 7ff36933e700  2 --
10.51.228.213:0/2678799534 >> 10.51.228.33:6800/5258 conn(0x7ff34c00ebb0
:-1 s=STATE_CONNECTING_WAIT_ACK_SEQ pgs=0 cs=0 l=1)._process_connection got
newly_acked_seq 0 vs out_seq 0
 stderr: 2023-10-27 19:48:57.804679 7ff36c135700  1 --
10.51.228.213:0/2678799534 --> 10.51.228.33:6789/0 --
mon_command({"prefix": "get_command_descriptions"} v 0) v1 --
0x7ff364099090 con 0
 stderr: 2023-10-27 19:48:57.807820 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 8 ====
mon_command_ack([{"prefix": "get_command_descriptions"}]=0  v0) v1 ====
72+0+66166 (1092875540 0 105479317) 0x7ff360072010 con 0x7ff36417be80
 stderr: 2023-10-27 19:48:57.894128 7ff36c135700  1 --
10.51.228.213:0/2678799534 --> 10.51.228.33:6789/0 --
mon_command({"prefix": "mon getmap"} v 0) v1 -- 0x7ff3640d9070 con 0
 stderr: 2023-10-27 19:48:57.894988 7ff353fff700  1 --
10.51.228.213:0/2678799534 <== mon.1 10.51.228.33:6789/0 9 ====
mon_command_ack([{"prefix": "mon getmap"}]=0 got monmap epoch 4 v4) v1 ====
76+0+438 (3852220838 0 1414311087) 0x7ff360062000 con 0x7ff36417be80
 stderr: got monmap epoch 4
 stderr: 2023-10-27 19:48:57.899563 7ff36c135700  1 --
10.51.228.213:0/2678799534 >> 10.51.228.33:6800/5258 conn(0x7ff34c00ebb0
:-1 s=STATE_OPEN pgs=915966 cs=1 l=1).mark_down
2023-10-27 19:48:57.899603 7ff36c135700  2 -- 10.51.228.213:0/2678799534 >>
10.51.228.33:6800/5258 conn(0x7ff34c00ebb0 :-1 s=STATE_OPEN pgs=915966 cs=1
l=1)._stop
2023-10-27 19:48:57.899637 7ff36c135700  1 -- 10.51.228.213:0/2678799534 >>
10.51.228.33:6789/0 conn(0x7ff36417be80 :-1 s=STATE_OPEN pgs=494490687 cs=1
l=1).mark_down
2023-10-27 19:48:57.899644 7ff36c135700  2 -- 10.51.228.213:0/2678799534 >>
10.51.228.33:6789/0 conn(0x7ff36417be80 :-1 s=STATE_OPEN pgs=494490687 cs=1
l=1)._stop
 stderr: 2023-10-27 19:48:57.900080 7ff36c135700  1 --
10.51.228.213:0/2678799534 shutdown_connections
 stderr: 2023-10-27 19:48:57.900797 7ff36c135700  1 --
10.51.228.213:0/2678799534 shutdown_connections
 stderr: 2023-10-27 19:48:57.901041 7ff36c135700  1 --
10.51.228.213:0/2678799534 wait complete.
2023-10-27 19:48:57.901079 7ff36c135700  1 -- 10.51.228.213:0/2678799534 >>
10.51.228.213:0/2678799534 conn(0x7ff3641698a0 :-1 s=STATE_NONE pgs=0 cs=0
l=0).mark_down
2023-10-27 19:48:57.901090 7ff36c135700  2 -- 10.51.228.213:0/2678799534 >>
10.51.228.213:0/2678799534 conn(0x7ff3641698a0 :-1 s=STATE_NONE pgs=0 cs=0
l=0)._stop
Running command: ceph-authtool /var/lib/ceph/osd/ceph-301/keyring
--create-keyring --name osd.301 --add-key
AQAnFDxlSUVdERAAvv6P4q/MGml/tPx9Kka77w==
 stdout: creating /var/lib/ceph/osd/ceph-301/keyring
added entity osd.301 auth auth(auid = 18446744073709551615
key=AQAnFDxlSUVdERAAvv6P4q/MGml/tPx9Kka77w== with 0 caps)
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301/keyring
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301/
Running command: /usr/bin/ceph-osd --cluster ceph --osd-objectstore
bluestore --mkfs -i 301 --monmap /var/lib/ceph/osd/ceph-301/activate.monmap
--keyfile - --osd-data /var/lib/ceph/osd/ceph-301/ --osd-uuid
21e9a327-ada5-4734-ab5d-7be333d4f3cf --setuser ceph --setgroup ceph
--> ceph-volume lvm prepare successful for: /dev/sda
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301
Running command: ceph-bluestore-tool --cluster=ceph prime-osd-dir --dev
/dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
--path /var/lib/ceph/osd/ceph-301
Running command: ln -snf
/dev/ceph-81236ab2-f6e0-4cc3-9815-95c8dd16c6ef/osd-block-21e9a327-ada5-4734-ab5d-7be333d4f3cf
/var/lib/ceph/osd/ceph-301/block
Running command: chown -h ceph:ceph /var/lib/ceph/osd/ceph-301/block
Running command: chown -R ceph:ceph /dev/dm-0
Running command: chown -R ceph:ceph /var/lib/ceph/osd/ceph-301
Running command: systemctl enable
ceph-volume@lvm-301-21e9a327-ada5-4734-ab5d-7be333d4f3cf
 stderr: Created symlink
/etc/systemd/system/multi-user.target.wants/ceph-volume@lvm-301-21e9a327-ada5-4734-ab5d-7be333d4f3cf.service
→ /lib/systemd/system/ceph-volume@.service.
Running command: systemctl enable --runtime ceph-osd@301
Running command: systemctl start ceph-osd@301
--> ceph-volume lvm activate successful for osd ID: 301
--> ceph-volume lvm create successful for: /dev/sda
 [root@dra1361 ~]#


Log file of OSD 301 (modified the IP address for security reasons):


2023-10-27 20:02:25.597254 7fbe3eee5e40  0 osd.301 0 crush map has features
288232575208783872, adjusting msgr requires for osds
2023-10-27 20:02:25.597293 7fbe3eee5e40  0 osd.301 0 load_pgs
2023-10-27 20:02:25.597301 7fbe3eee5e40  0 osd.301 0 load_pgs opened 0 pgs
2023-10-27 20:02:25.597303 7fbe3eee5e40  2 osd.301 0 superblock: I am
osd.301
2023-10-27 20:02:25.597304 7fbe3eee5e40  0 osd.301 0 using weightedpriority
op queue with priority op cut off at 64.
2023-10-27 20:02:25.597390 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.597594 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.597933 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.598011 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.598086 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.598224 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.598467 7fbe3eee5e40  1  Processor -- start
2023-10-27 20:02:25.598833 7fbe3eee5e40 -1 osd.301 0 log_to_monitors
{default=true}
2023-10-27 20:02:25.599904 7fbe3eee5e40  1 -- 10.10.21.213:6800/20559 -->
10.10.21.32:6789/0 -- auth(proto 0 28 bytes epoch 0) v1 -- 0x55cf49fcd180
con 0
2023-10-27 20:02:25.599922 7fbe3eee5e40  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- auth(proto 0 28 bytes epoch 0) v1 -- 0x55cf49fcd400
con 0
2023-10-27 20:02:25.601479 7fbe3de99700  2 -- 10.10.21.213:6800/20559 >>
10.10.21.33:6789/0 conn(0x55cf4a291800 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
2023-10-27 20:02:25.601712 7fbe3d698700  2 -- 10.10.21.213:6800/20559 >>
10.10.21.32:6789/0 conn(0x55cf4a293000 :-1 s=STATE_CONNECTING_WAIT_ACK_SEQ
pgs=0 cs=0 l=1)._process_connection got newly_acked_seq 0 vs out_seq 0
2023-10-27 20:02:25.602181 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 1 ==== mon_map magic: 0 v1 ==== 442+0+0 (171445244
0 0) 0x55cf4a29efc0 con 0x55cf4a291800
2023-10-27 20:02:25.602305 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
33+0+0 (1293857214 0 0) 0x55cf49fcd400 con 0x55cf4a291800
2023-10-27 20:02:25.602475 7fbe3508b700  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x55cf49fcd900
con 0
2023-10-27 20:02:25.602507 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.0 10.10.21.32:6789/0 1 ==== mon_map magic: 0 v1 ==== 442+0+0 (171445244
0 0) 0x55cf4a29efc0 con 0x55cf4a293000
2023-10-27 20:02:25.602557 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.0 10.10.21.32:6789/0 2 ==== auth_reply(proto 2 0 (0) Success) v1 ====
33+0+0 (2698054748 0 0) 0x55cf49fcd180 con 0x55cf4a293000
2023-10-27 20:02:25.602627 7fbe3508b700  1 -- 10.10.21.213:6800/20559 -->
10.10.21.32:6789/0 -- auth(proto 2 32 bytes epoch 0) v1 -- 0x55cf49fcd400
con 0
2023-10-27 20:02:25.603288 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
206+0+0 (3831096342 0 0) 0x55cf49fcd900 con 0x55cf4a291800
2023-10-27 20:02:25.603488 7fbe3508b700  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x55cf49fcd180
con 0
2023-10-27 20:02:25.603707 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.0 10.10.21.32:6789/0 3 ==== auth_reply(proto 2 0 (0) Success) v1 ====
206+0+0 (1800961675 0 0) 0x55cf49fcd400 con 0x55cf4a293000
2023-10-27 20:02:25.603889 7fbe3508b700  1 -- 10.10.21.213:6800/20559 -->
10.10.21.32:6789/0 -- auth(proto 2 165 bytes epoch 0) v1 -- 0x55cf49fcd900
con 0
2023-10-27 20:02:25.604366 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 4 ==== auth_reply(proto 2 0 (0) Success) v1 ====
596+0+0 (1171153923 0 0) 0x55cf49fcd180 con 0x55cf4a291800
2023-10-27 20:02:25.604560 7fbe3508b700  1 -- 10.10.21.213:6800/20559 >>
10.10.21.32:6789/0 conn(0x55cf4a293000 :-1 s=STATE_OPEN pgs=316446787 cs=1
l=1).mark_down
2023-10-27 20:02:25.604572 7fbe3508b700  2 -- 10.10.21.213:6800/20559 >>
10.10.21.32:6789/0 conn(0x55cf4a293000 :-1 s=STATE_OPEN pgs=316446787 cs=1
l=1)._stop
2023-10-27 20:02:25.604622 7fbe3508b700  1 monclient: found mon.or1dra1301
2023-10-27 20:02:25.604663 7fbe3508b700  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- mon_subscribe({monmap=0+}) v2 -- 0x55cf49dc9680 con 0
2023-10-27 20:02:25.604712 7fbe3508b700  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- auth(proto 2 2 bytes epoch 0) v1 -- 0x55cf49fcd400
con 0
2023-10-27 20:02:25.604792 7fbe3eee5e40  5 monclient: authenticate success,
global_id 1528606696
2023-10-27 20:02:25.605306 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 5 ==== mon_map magic: 0 v1 ==== 442+0+0 (171445244
0 0) 0x55cf4a29f200 con 0x55cf4a291800
2023-10-27 20:02:25.605398 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 6 ==== auth_reply(proto 2 0 (0) Success) v1 ====
194+0+0 (2618733096 0 0) 0x55cf49fcd400 con 0x55cf4a291800
2023-10-27 20:02:25.605807 7fbe3eee5e40  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- mon_command({"prefix": "osd crush set-device-class",
"class": "ssd", "ids": ["301"]} v 0) v1 -- 0x55cf49dc98c0 con 0
2023-10-27 20:02:25.608362 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 7 ==== mon_command_ack([{"prefix": "osd crush
set-device-class", "class": "ssd", "ids": ["301"]}]=0 osd.301 already set
to class ssdset-device-class item id 301 name 'osd.301' device_class 'ssd':
no change v1497811) v1 ==== 211+0+0 (3030640430 0 0) 0x55cf49dc98c0 con
0x55cf4a291800
2023-10-27 20:02:25.608668 7fbe3eee5e40  1 -- 10.10.21.213:6800/20559 -->
10.10.21.33:6789/0 -- mon_command({"prefix": "osd crush create-or-move",
"id": 301, "weight":3.4931, "args": ["host=or1dra1361", "root=default"]} v
0) v1 -- 0x55cf49dc9b00 con 0
2023-10-27 20:02:25.611784 7fbe3508b700  1 -- 10.10.21.213:6800/20559 <==
mon.1 10.10.21.33:6789/0 8 ==== mon_command_ack([{"prefix": "osd crush
create-or-move", "id": 301, "weight":3.4931, "args": ["host=or1dra1361",
"root=default"]}]=-34 (34) Numerical result out of range v1497811) v1 ====
179+0+0 (1380436622 0 0) 0x55cf49dc9b00 con 0x55cf4a291800
2023-10-27 20:02:25.612011 7fbe3eee5e40 -1 osd.301 0
mon_cmd_maybe_osd_create fail: '(34) Numerical result out of range': (34)
Numerical result out of range
2023-10-27 20:02:25.612070 7fbe3eee5e40 -1 osd.301 0 init unable to
update_crush_location: (34) Numerical result out of range
 [root@dra1361 /var/log/ceph]#



Thanks,
Pardhiv Karri






On Fri, Oct 27, 2023 at 7:00 AM <ceph-users-request@xxxxxxx> wrote:

Send ceph-users mailing list submissions to
        ceph-users@xxxxxxx

To subscribe or unsubscribe via email, send a message with subject or
body 'help' to
        ceph-users-request@xxxxxxx

You can reach the person managing the list at
        ceph-users-owner@xxxxxxx

When replying, please edit your Subject line so it is more specific
than "Re: Contents of ceph-users digest..."

Today's Topics:

   1. Re: Ceph - Error ERANGE: (34) Numerical result out of range
      (Eugen Block)
   2. Re: [EXTERNAL] [Pacific] ceph orch device ls do not returns any HDD
      (Patrick Begou)
   3. Re: [ext] CephFS pool not releasing space after data deletion
      (Kuhring, Mathias)
   4. Re: "cephadm version" in reef returns "AttributeError:
'CephadmContext' object has no attribute 'fsid'"
      (John Mulligan)


----------------------------------------------------------------------

Date: Fri, 27 Oct 2023 11:56:38 +0000
From: Eugen Block <eblock@xxxxxx>
Subject:  Re: Ceph - Error ERANGE: (34) Numerical result
        out of range
To: ceph-users@xxxxxxx
Message-ID:
        <20231027115638.Horde.48HDZ8Azsv-ho_0pQNe0p-s@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset=utf-8; format=flowed; DelSp=Yes

Hi,

please provide more information about your cluster, like 'ceph -s',
'ceph osd tree' and the exact procedure you used to create the OSDs.
 From your last post it seems like the OSD creation failed and this
might be just a consequence of that? Do you have the logs from the OSD
creation as well? Not just the logs from the failing OSD start.

Thanks,
Eugen

Zitat von Pardhiv Karri <meher4india@xxxxxxxxx>:

> Hi,
> Trying to move a node/host under a new SSD root and getting below error.
> Has anyone seen it and know the fix? the pg_num and pgp_num are same for
> all pools so that is not the issue.
>
>  [root@hbmon1 ~]# ceph osd crush move hbssdhost1 root=ssd
> Error ERANGE: (34) Numerical result out of range
>  [root@hbmon1 ~]#
>
> Thanks,
> Pardhiv
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx



------------------------------

Date: Fri, 27 Oct 2023 15:35:37 +0200
From: Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>
Subject:  Re: [EXTERNAL] [Pacific] ceph orch device ls do
        not returns any HDD
To: ceph-users@xxxxxxx
Message-ID:
        <1fe5ff89-5b96-20e6-988d-ab2dd514ca2f@xxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset=UTF-8; format=flowed

Hi all,

First of all I apologize if I've not done things correctly but these are
some tests results.

1) I've compiled the main branch in a fresh podman container (Alma Linux
8) and installed. Successfull!
2) I have done a copy of the /etc/ceph directory of the host (member of
the ceph cluster in Pacific 16.2.14) in this container (good or bad idea ?)
3) "ceph-volume inventory" works but with some error messages:

[root@74285dcfa91f etc]# ceph-volume inventory
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
/sys expected.
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
/sys expected.
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
/sys expected.
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
/sys expected.
  stderr: Unknown device, --name=, --path=, or absolute path in /dev/ or
/sys expected.

Device Path               Size         Device nodes    rotates available
Model name
/dev/sdc                  232.83 GB    sdc             True True
SAMSUNG HE253GJ
/dev/sda                  232.83 GB    sda             True False
SAMSUNG HE253GJ
/dev/sdb                  465.76 GB    sdb             True False
WDC WD5003ABYX-1
4) ceph version show:
[root@74285dcfa91f etc]# ceph -v
ceph version 18.0.0-6846-g2706ecac4a9
(2706ecac4a90447420904e42d6e0445134dff2be) reef (dev)


5) lsblk works (container launched with "--privileged" flag)
[root@74285dcfa91f etc]# lsblk
NAME   MAJ:MIN RM   SIZE RO TYPE MOUNTPOINT
sda      8:0    1 232.9G  0 disk
|-sda1   8:1        3.9G  0 part
|-sda2   8:2    1   3.9G  0 part [SWAP]
`-sda3   8:3    1   225G  0 part
sdb      8:16   1 465.8G  0 disk
sdc      8:32   1 232.9G  0 disk

But some commands do not works (my setup or ceph ?)

[root@74285dcfa91f etc]# ceph orch device zap
mostha1.legi.grenoble-inp.fr /dev/sdc --force
Error EINVAL: Device path '/dev/sdc' not found on host
'mostha1.legi.grenoble-inp.fr'
[root@74285dcfa91f etc]#

[root@74285dcfa91f etc]# ceph orch device ls
[root@74285dcfa91f etc]#

Patrick


Le 24/10/2023 à 22:43, Zack Cerza a écrit :
> That's correct - it's the removable flag that's causing the disks to
> be excluded.
>
> I actually just merged this PR last week:
> https://github.com/ceph/ceph/pull/49954
>
> One of the changes it made was to enable removable (but not USB)
> devices, as there are vendors that report hot-swappable drives as
> removable. Patrick, it looks like this may resolve your issue as well.
>
>
> On Tue, Oct 24, 2023 at 5:57 AM Eugen Block <eblock@xxxxxx> wrote:
>> Hi,
>>
>>> May be because they are hot-swappable hard drives.
>> yes, that's my assumption as well.
>>
>>
>> Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
>>
>>> Hi Eugen,
>>>
>>> Yes Eugen, all the devices /dev/sd[abc] have the removable flag set
>>> to 1. May be because they are hot-swappable hard drives.
>>>
>>> I have contacted the commit author Zack Cerza and he asked me for
>>> some additional tests too this morning. I add him in copy to this
>>> mail.
>>>
>>> Patrick
>>>
>>> Le 24/10/2023 à 12:57, Eugen Block a écrit :
>>>> Hi,
>>>>
>>>> just to confirm, could you check that the disk which is *not*
>>>> discovered by 16.2.11 has a "removable" flag?
>>>>
>>>> cat /sys/block/sdX/removable
>>>>
>>>> I could reproduce it as well on a test machine with a USB thumb
>>>> drive (live distro) which is excluded in 16.2.11 but is shown in
>>>> 16.2.10. Although I'm not a developer I tried to understand what
>>>> changes were made in
>>>>
https://github.com/ceph/ceph/pull/46375/files#diff-330f9319b0fe352dff0486f66d3c4d6a6a3d48efd900b2ceb86551cfd88dc4c4R771
and there's this
>>>> line:
>>>>
>>>>> if get_file_contents(os.path.join(_sys_block_path, dev,
>>>>> 'removable')) == "1":
>>>>>     continue
>>>> The thumb drive is removable, of course, apparently that is filtered
here.
>>>>
>>>> Regards,
>>>> Eugen
>>>>
>>>> Zitat von Patrick Begou <Patrick.Begou@xxxxxxxxxxxxxxxxxxxxxx>:
>>>>
>>>>> Le 23/10/2023 à 03:04, 544463199@xxxxxx a écrit :
>>>>>> I think you can try to roll back this part of the python code and
>>>>>> wait for your good news :)
>>>>>
>>>>> Not so easy 😕
>>>>>
>>>>>
>>>>> [root@e9865d9a7f41 ceph]# git revert
>>>>> 4fc6bc394dffaf3ad375ff29cbb0a3eb9e4dbefc
>>>>> Auto-merging src/ceph-volume/ceph_volume/tests/util/test_device.py
>>>>> CONFLICT (content): Merge conflict in
>>>>> src/ceph-volume/ceph_volume/tests/util/test_device.py
>>>>> Auto-merging src/ceph-volume/ceph_volume/util/device.py
>>>>> CONFLICT (content): Merge conflict in
>>>>> src/ceph-volume/ceph_volume/util/device.py
>>>>> Auto-merging src/ceph-volume/ceph_volume/util/disk.py
>>>>> CONFLICT (content): Merge conflict in
>>>>> src/ceph-volume/ceph_volume/util/disk.py
>>>>> error: could not revert 4fc6bc394df... ceph-volume: Optionally
>>>>> consume loop devices
>>>>>
>>>>> Patrick
>>>>> _______________________________________________
>>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>>>
>>>> _______________________________________________
>>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>> _______________________________________________
>>> ceph-users mailing list -- ceph-users@xxxxxxx
>>> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>>
>>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx


------------------------------

Date: Fri, 27 Oct 2023 13:52:03 +0000
From: "Kuhring, Mathias" <mathias.kuhring@xxxxxxxxxxxxxx>
Subject:  Re: [ext] CephFS pool not releasing space after
        data deletion
To: "ceph-users@xxxxxxx" <ceph-users@xxxxxxx>, "frans@xxxxxx"
        <frans@xxxxxx>
Message-ID: <a5bb6a0a-aab5-402d-8ee3-68eccabb7b6b@xxxxxxxxxxxxxx>
Content-Type: text/plain; charset="utf-8"

Dear ceph users,

We are wondering, if this might be the same issue as with this bug:
https://tracker.ceph.com/issues/52581

Except that we seem to have been snapshots dangling on the old pool.
And the bug report snapshots dangling on the new pool.
But maybe it's both?

I mean, once the global root layout was created to a new pool,
the new pool became in charge for snapshooting at least of new data, right?
What about data which is overwritten? Is there a conflict of
responsibility?

We do have similar listings of snaps with "ceph osd pool ls detail", I
think:

0|0[root@osd-1 ~]# ceph osd pool ls detail | grep -B 1 removed_snaps_queue
pool 1 'cephfs_data' replicated size 3 min_size 2 crush_rule 1
object_hash rjenkins pg_num 115 pgp_num 107 pg_num_target 32
pgp_num_target 32 autoscale_mode on last_change 803558 lfor
0/803250/803248 flags hashpspool,selfmanaged_snaps stripe_width 0
expected_num_objects 1 application cephfs
         removed_snaps_queue

[3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
--
pool 3 'hdd_ec' erasure profile hdd_ec size 3 min_size 2 crush_rule 3
object_hash rjenkins pg_num 2048 pgp_num 2048 autoscale_mode off
last_change 803558 lfor 0/87229/87229 flags
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 8192 application
cephfs
         removed_snaps_queue

[3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]
--
pool 20 'hdd_ec_8_2_pool' erasure profile hdd_ec_8_2_profile size 10
min_size 9 crush_rule 5 object_hash rjenkins pg_num 8192 pgp_num 8192
autoscale_mode off last_change 803558 lfor 0/0/681917 flags
hashpspool,ec_overwrites,selfmanaged_snaps stripe_width 32768
application cephfs
         removed_snaps_queue

[3541~1,36e4~1,379f~2,3862~1,3876~1,387d~1,388b~1,389a~1,38a6~1,38bc~1,3993~1,3999~1,39a0~1,39a7~1,39ae~1,39b5~3,39be~1,39c5~1,39cc~1]


Here, pool hdd_ec_8_2_pool is the one we recently assigned to the root
layout.
Pool hdd_ec is the one which was assigned before and which won't release
space (at least where I know of).

Is this removed_snaps_queue the same as removed_snaps in the bug issue
(i.e. the label was renamed)?
And is it normal that all queues list the same info or should this be
different per pool?
Might this be related to pools having now share responsibility over some
snaps due to layout changes?

And for the big question:
How can I actually trigger/speedup the removal of those snaps?
I find the removed_snaps/removed_snaps_queue mentioned a few times in
the user list.
But never with some conclusive answer how to deal with them.
And the only mentions in the docs are just change logs.

I also looked into and started cephfs stray scrubbing:

https://docs.ceph.com/en/latest/cephfs/scrub/#evaluate-strays-using-recursive-scrub
But according to the status output, no scrubbing is actually active.

I would appreciate any further ideas. Thanks a lot.

Best Wishes,
Mathias

On 10/23/2023 12:42 PM, Kuhring, Mathias wrote:
> Dear Ceph users,
>
> Our CephFS is not releasing/freeing up space after deleting hundreds of
> terabytes of data.
> By now, this drives us in a "nearfull" osd/pool situation and thus
> throttles IO.
>
> We are on ceph version 17.2.6 (d7ff0d10654d2280e08f1ab989c7cdf3064446a5)
> quincy (stable).
>
> Recently, we moved a bunch of data to a new pool with better EC.
> This was done by adding a new EC pool to the FS.
> Then assigning the FS root to the new EC pool via the directory layout
xattr
> (so all new data is written to the new pool).
> And finally copying old data to new folders.
>
> I swapped the data as follows to remain the old directory structures.
> I also made snapshots for validation purposes.
>
> So basically:
> cp -r mymount/mydata/ mymount/new/ # this creates copy on new pool
> mkdir mymount/mydata/.snap/tovalidate
> mkdir mymount/new/mydata/.snap/tovalidate
> mv mymount/mydata/ mymount/old/
> mv mymount/new/mydata mymount/
>
> I could see the increase of data in the new pool as expected (ceph df).
> I compared the snapshots with hashdeep to make sure the new data is
alright.
>
> Then I went ahead deleting the old data, basically:
> rmdir mymount/old/mydata/.snap/* # this also included a bunch of other
> older snapshots
> rm -r mymount/old/mydata
>
> At first we had a bunch of PGs with snaptrim/snaptrim_wait.
> But they are done for quite some time now.
> And now, already two weeks later the size of the old pool still hasn't
> really decreased.
> I'm still waiting for around 500 TB to be released (and much more is
> planned).
>
> I honestly have no clue, where to go from here.
>   From my point of view (i.e. the CephFS mount), the data is gone.
> I also never hard/soft-linked it anywhere.
>
> This doesn't seem to be a regular issue.
> At least I couldn't find anything related or resolved in the docs or
> user list, yet.
> If anybody has an idea how to resolve this, I would highly appreciate it.
>
> Best Wishes,
> Mathias
>
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx

--
Mathias Kuhring

Dr. rer. nat.
Bioinformatician
HPC & Core Unit Bioinformatics
Berlin Institute of Health at Charité (BIH)

E-Mail: mathias.kuhring@xxxxxxxxxxxxxx
Mobile: +49 172 3475576


------------------------------

Date: Fri, 27 Oct 2023 09:57:31 -0400
From: John Mulligan <phlogistonjohn@xxxxxxxxxxxxx>
Subject:  Re: "cephadm version" in reef returns
        "AttributeError: 'CephadmContext' object has no attribute 'fsid'"
To: ceph-users@xxxxxxx
Message-ID:
        <
7411686.LvFx2qVVIh@xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx>
Content-Type: text/plain; charset="us-ascii"

On Friday, October 27, 2023 2:40:17 AM EDT Eugen Block wrote:
> Are the issues you refer to the same as before? I don't think this
> version issue is the root cause, I do see it as well in my test
> cluster(s) but the rest works properly except for the tag issue I
> already reported which you can easily fix by setting the config value
> for the default image
> (
https://lists.ceph.io/hyperkitty/list/ceph-users@xxxxxxx/thread/LASBJCSPFGD
> YAWPVE2YLV2ZLF3HC5SLS/#LASBJCSPFGDYAWPVE2YLV2ZLF3HC5SLS). Or are there
new
> issues you encountered?


I concur. That `cephadm version` failure is/was a known issue but should
not
be the cause of any other issues.  On the main branch `cephadm version` no
longer fails this way - rather, it reports the version of a cephadm build
and
no longer inspects a container image.  We can look into backporting this
before the next reef release.

The issue related to the container image tag that Eugen filed has also
been
fixed on reef. Thanks for filing that.

Martin you may want to retry things after the next reef release.
Unfortunately, I don't know when that is planned but I think it's soonish.

>
> Zitat von Martin Conway <martin.conway@xxxxxxxxxx>:
> > I just had another look through the issues tracker and found this
> > bug already listed.
> > https://tracker.ceph.com/issues/59428
> >
> > I need to go back to the other issues I am having and figure out if
> > they are related or something different.
> >
> >
> > Hi
> >
> > I wrote before about issues I was having with cephadm in 18.2.0
> > Sorry, I didn't see the helpful replies because my mail service
> > binned the responses.
> >
> > I still can't get the reef version of cephadm to work properly.
> >
> > I had updated the system rpm to reef (ceph repo) and also upgraded
> > the containerised ceph daemons to reef before my first email.
> >
> > Both the system package cephadm and the one found at
> > /var/lib/ceph/${fsid}/cephadm.* return the same error when running
> > "cephadm version"
> >
> > Traceback (most recent call last):
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 9468, in <module>
> >
> >     main()
> >
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 9456, in main
> >
> >     r = ctx.func(ctx)
> >
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 2108, in _infer_image
> >
> >     ctx.image = infer_local_ceph_image(ctx, ctx.container_engine.path)
> >
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 2191, in infer_local_ceph_image
> >
> >     container_info = get_container_info(ctx, daemon, daemon_name is not
> >     None)
> >
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 2154, in get_container_info
> >
> >     matching_daemons = [d for d in daemons if daemon_name_or_type(d)
> >
> > == daemon_filter and d['fsid'] == ctx.fsid]
> >
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 2154, in <listcomp>
> >
> >     matching_daemons = [d for d in daemons if daemon_name_or_type(d)
> >
> > == daemon_filter and d['fsid'] == ctx.fsid]
> >
> >   File
> >
> >
"./cephadm.059bfc99f5cf36ed881f2494b104711faf4cbf5fc86a9594423cc105cafd9b4
> > e", line 217, in __getattr__
> >
> >     return super().__getattribute__(name)
> >
> > AttributeError: 'CephadmContext' object has no attribute 'fsid'
> >
> > _______________________________________________
> > ceph-users mailing list -- ceph-users@xxxxxxx
> > To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx




------------------------------

Subject: Digest Footer

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
%(web_page_url)slistinfo%(cgiext)s/%(_internal_name)s


------------------------------

End of ceph-users Digest, Vol 112, Issue 119
********************************************



--
*Pardhiv Karri*
"Rise and Rise again until LAMBS become LIONS"



_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux