I've checked XFS filesystem to see if this has any effect in OSD. No luck...
The crash can be seen here:
But I will paste a full OSD log.
On Thu, Dec 30, 2021 at 8:22 PM Gonzalo Aguilar Delgado <gaguilar.delgado@xxxxxxxxx> wrote:
Hi,I was commenting to the IRC channels, after several tryies we found no solution. I will try to post all the information.I have a cluster ceph version 16.2.7 (dd0603118f56ab514f133c8d2e3adfc983942503) pacific (stable)It was manually installed from very firsts versions and manually upgraded on each release. It was working nice, until powercut.On reboot half of the OSD where unable to boot. They eat all memory on the host and never connect to the mon.The OSD start by replaying journal as usual and when the message logging_to_monitors appear it start eating memory until oom.I added a very big swap to see if it pass this point. And it does... Then it dies.It doesn't but at least don't die until it crashes with heartbeat error. I attach some logs...What we tried is to shutdown everything mon, mgr, mds, etc. Start only mon and mgr, and see if they connect and start correctly. Seems to work.Then I start one problematic OSD. The OSD.2 with full log enabled. This http://pastie.org/p/1L0nidHL0JkA3BH1RDzLMy Works but doesn't connect to the mon.We checked, network, jumbo frames and everything that can affect. I ran objecttool to fix pgs, and everything seems right. This OSD.2 is running on XFS disk with filestore... Yes, I know I should migrate, but I don't.The curious thing is that I have another OSD in that same host and it works. OSD.4 this one is bluestore. But I don't know what has to be, since it's not connecting. But the underlaying filesystem works.More strange things are this..>ceph osd treeroot@red-compute:/home/gaguilar# ceph osd tree
ID CLASS WEIGHT TYPE NAME STATUS REWEIGHT PRI-AFF
-1 8.74597 root default
-2 3.81898 host blue-compute
0 hdd 1.00000 osd.0 down 0 1.00000
2 hdd 1.00000 osd.2 down 0 1.00000
4 hdd 1.81898 osd.4 up 0 1.00000
-5 2.36800 host cadet-compute
1 hdd 0.03000 osd.1 down 0.90002 1.00000
5 hdd 0.03999 osd.5 down 0.90002 1.00000
7 hdd 0.03000 osd.7 down 0.90002 1.00000
11 hdd 0.45000 osd.11 up 1.00000 1.00000
13 hdd 1.81799 osd.13 down 1.00000 1.00000
-6 0.45000 host cobalt-compute
12 hdd 0.45000 osd.12 up 1.00000 1.00000
-3 2.10899 host red-compute
6 hdd 0.90900 osd.6 down 0 1.00000
9 hdd 0.90900 osd.9 down 1.00000 1.00000
10 hdd 0.29099 osd.10 down 1.00000 1.00000Reports 3 OSD up. 4, 11, 12. But only 4 is up. Is the one I do tests on blue-compute since it works!The other two are down. And they don't get updated. Why?The OSD.4 tries to ping them because they seems to be up, but it fails. My ceph.conf doesn't have anything weird[global]
fsid = 9028f4da-0d77-462b-be9b-dbdf7fa57771
#mon_initial_members = blue-compute, red-compute, cadet-compute
mon_initial_members = red-compute, cadet-compute
mon_host = [v2:172.16.0.100:3300,v1:172.16.0.100:6789], 172.16.99.10, 172.16.0.119
#mon_host = 172.16.0.119, 172.16.0.100, 172.16.99.10
#mgr_host = 172.16.0.119
#mon_host = 172.16.0.100, 172.16.99.10
auth_cluster_required = cephx
auth_service_required = cephx
auth_client_required = cephx
filestore_xattr_use_omap = true
osd_pool_default_pg_num = 128
osd_pool_default_pgp_num = 128
osd_pool_default_size = 2 # Write an object 3 times.
osd_pool_default_min_size = 1 # Allow writing one copy in a degraded state.
public_network = 172.16.0.0/16
#osd_recovery_max_active = 9
#osd_max_backfills = 3
#osd_recovery_op_priority = 3
mon_data_avail_warn = 10
[mon]
caps_mon = "allow *"
[osd]
#bluestore cache autotune=1
osd max write size = 512
osd client message size cap = 1024
osd op threads = 1
#osd mount options xfs = "rw,noexec,nodev,noatime,nodiratime,nobarrier"
#filestore xattr use omap = true # Default false# by XATTRS Use object map,EXT4 File system ,XFS perhaps btrfs You can also use
#filestore min sync interval = 10 # Default 0.1# Minimum synchronization interval from log to data disk (seconds)
#filestore max sync interval = 15 # Default 5# Maximum synchronization interval from log to data disk (seconds)
#filestore queue max ops = 2500 # Default 500# The maximum number of operations accepted by the data disk
#filestore queue max bytes = 1048576000 # Default 100 # The maximum number of bytes in one operation of data disk (bytes
#filestore queue committing max ops = 50000 # Default 500 # The data disk can commit The number of operations
#filestore queue committing max bytes = 10485760000 # Default 100 # The data disk can commit The maximum number of bytes (bytes)
#filestore split multiple = 8 # The default value is 2 # The maximum number of files in the previous subdire>
#filestore merge threshold = 40 # The default value is 10 # The minimum number of files in the previous subclass dire>
#filestore fd cache size = 1024 # The default value is 128 # Object file handle cache size
filestore op threads = 1 # The default value is 2 # Concurrent file system operations
[osd]
osd max pg log entries = 50
osd min pg log entries = 50
osd_pg_log_dups_tracked = 50I also have other hosts with some disk up and other that doesn't. The whole status of the cluster is garbage.I'm truly desperate. This is what I tried:
- Starting manually
- Run objectstore tool, fix and repair
- Apply journal
- Remove selinux
- Telnet to mon from OSD
- Starting different osd... some works.
- Adding new mgr
- Downgrade kernel
- upgrade kernel
- Try different pacific versions (patch version)
- Try run two different OSD in the same machine to see if they communicate osd.4 and osd.2 (2 doesn't work)
- running osd as root
Nothing works.I see a lot of:2021-12-30T19:17:12.022+0000 7f114647b640 20 osd.2 1492194 tick last_purged_snaps_scrub 2021-12-30T11:58:25.837843+0000 next 2021-12-31T11:58:25.837843+0000
2021-12-30T19:17:12.030+0000 7f1121b89640 10 --2- [v2:172.16.0.119:6808/11420,v1:172.16.0.119:6809/11420] >> [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] conn(0x5594e3af6400 0x5594e3d57400 secure :-1 s=READY pgs=12286 cs=0 l=1 rev1=1 rx=0x5595a2aa88d0 tx=0x5594ecdcb380).send_keepalive
2021-12-30T19:17:12.030+0000 7f114a6e3640 10 -- [v2:172.16.0.119:6808/11420,v1:172.16.0.119:6809/11420] >> [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] conn(0x5594e3af6400 msgr2=0x5594e3d57400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1).handle_write
2021-12-30T19:17:12.030+0000 7f114a6e3640 10 --2- [v2:172.16.0.119:6808/11420,v1:172.16.0.119:6809/11420] >> [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] conn(0x5594e3af6400 0x5594e3d57400 secure :-1 s=READY pgs=12286 cs=0 l=1 rev1=1 rx=0x5595a2aa88d0 tx=0x5594ecdcb380).write_event
2021-12-30T19:17:12.030+0000 7f114a6e3640 10 --2- [v2:172.16.0.119:6808/11420,v1:172.16.0.119:6809/11420] >> [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] conn(0x5594e3af6400 0x5594e3d57400 secure :-1 s=READY pgs=12286 cs=0 l=1 rev1=1 rx=0x5595a2aa88d0 tx=0x5594ecdcb380).write_event appending keepalive
2021-12-30T19:17:12.030+0000 7f114a6e3640 10 -- [v2:172.16.0.119:6808/11420,v1:172.16.0.119:6809/11420] >> [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] conn(0x5594e3af6400 msgr2=0x5594e3d57400 secure :-1 s=STATE_CONNECTION_ESTABLISHED l=1)._try_send sent bytes 96 remaining bytes 0
2021-12-30T19:17:12.034+0000 7f114a6e3640 10 --2- [v2:172.16.0.119:6808/11420,v1:172.16.0.119:6809/11420] >> [v2:172.16.0.100:3300/0,v1:172.16.0.100:6789/0] conn(0x5594e3af6400 0x5594e3d57400 secure :-1 s=READY pgs=12286 cs=0 l=1 rev1=1 rx=0x5595a2aa88d0 tx=0x5594ecdcb380).handle_read_frame_dispatch tag=19So it seems it's connected, why it doesn't get up?At the end of the log I can see a HeartBeat crash.So what can I try to recover the cluster I ran out of ideas...
--
No subestimes el poder de la gente estúpida en grupos grandes...
_______________________________________________ Dev mailing list -- dev@xxxxxxx To unsubscribe send an email to dev-leave@xxxxxxx