I wonder if anyone could offer any insight on the issue below,
regarding the CentOS 7.6 kernel cephfs client connecting to a
Luminous cluster. I have since tried a much newer 4.19.13 kernel,
which did not show the same issue (but unfortunately for various
reasons unrelated to ceph, we can't go to such a new kernel).
Am I reading it right that somehow the monitor thinks this kernel is
old and needs to prepare special maps in some older format for it,
and that takes too long and the kernel just gives up, or perhaps has
some other communication protocol error? It seems like one of these
mon communication sessions only lasts half a second. Then it
reconnects to another mon, and gets the same result, etc. Any way
around this?
Andras
On 12/26/18 7:55 PM, Andras Pataki
wrote:
We've been using ceph-fuse with a pretty good stability record
(against the Luminous 12.2.8 back end). Unfortunately ceph-fuse
has extremely poor small file performance (understandably), so
we've been testing the kernel client. The latest RedHat kernel
3.10.0-957.1.3.el7.x86_64 seems to work pretty well, as long as
the cluster is running in a completely clean state. However, it
seems that as soon as there is something happening to the cluster,
the kernel client crashes pretty badly.
Today's example: I've reweighted some OSDs to balance the disk
usage a bit (set nobackfill, reweight the OSDs, check the new
hypothetical space usage, then unset nobackfill). As soon as the
reweighting procedure started, the kernel client went into an
infinite loop trying to unsuccessfully connect to mons:
Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789
io error
Dec 26 19:28:53 mon5 kernel: libceph: mon0 10.128.150.10:6789
session lost, hunting for new mon
Dec 26 19:28:53 mon5 kernel: libceph: mon2 10.128.150.12:6789
session established
Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789
io error
Dec 26 19:28:58 mon5 kernel: libceph: mon2 10.128.150.12:6789
session lost, hunting for new mon
Dec 26 19:28:58 mon5 kernel: libceph: mon1 10.128.150.11:6789
session established
Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789
io error
Dec 26 19:28:59 mon5 kernel: libceph: mon1 10.128.150.11:6789
session lost, hunting for new mon
Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789
session established
Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789
io error
Dec 26 19:28:59 mon5 kernel: libceph: mon2 10.128.150.12:6789
session lost, hunting for new mon
Dec 26 19:28:59 mon5 kernel: libceph: mon0 10.128.150.10:6789
session established
Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789
io error
Dec 26 19:29:00 mon5 kernel: libceph: mon0 10.128.150.10:6789
session lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
session established
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
io error
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
session lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
session established
Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
io error
Dec 26 19:29:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
session lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon2 10.128.150.12:6789
session established
Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789
io error
Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789
session lost, hunting for new mon
Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
session established
Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
io error
Dec 26 19:29:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
session lost, hunting for new mon
Dec 26 19:29:01 mon5 kernel: libceph: mon2 10.128.150.12:6789
session established
Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789
io error
Dec 26 19:29:02 mon5 kernel: libceph: mon2 10.128.150.12:6789
session lost, hunting for new mon
Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789
session established
Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789
io error
Dec 26 19:29:02 mon5 kernel: libceph: mon1 10.128.150.11:6789
session lost, hunting for new mon
... etc ...
seemingly never recovering. The cluster is healthy, all other
clients are successfully doing I/O:
[root@cephmon00 ceph]# ceph -s
cluster:
id: d7b33135-0940-4e48-8aa6-1d2026597c2f
health: HEALTH_WARN
noout flag(s) set
1 backfillfull osd(s)
4 pool(s) backfillfull
239119058/12419244975 objects misplaced (1.925%)
services:
mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
mgr: cephmon00(active)
mds: cephfs-1/1/1 up {0=cephmds00=up:active}, 1
up:standby
osd: 3534 osds: 3534 up, 3534 in; 5040 remapped pgs
flags noout
data:
pools: 5 pools, 50688 pgs
objects: 2.51G objects, 7.77PiB
usage: 19.1PiB used, 5.90PiB / 25.0PiB avail
pgs: 239119058/12419244975 objects misplaced
(1.925%)
45639 active+clean
2914 active+remapped+backfilling
2126 active+remapped+backfill_wait
9 active+clean+scrubbing+deep
io:
client: 10.3MiB/s rd, 2.23GiB/s wr, 32op/s rd,
3.77kop/s wr
recovery: 114GiB/s, 33.12kobjects/s
The client machine in question is otherwise healthy also (not out
of memory, etc.). I've checked the osd blacklist, nothing on
that. Nothing suspicious in the mon logs regarding this client
either. Setting one of the mons to log level 20, the relevant
section when this client tries to connect:
2018-12-26 19:51:46.203264 7fffeb406700 10
mon.cephmon00@0(leader) e40 ms_verify_authorizer
10.128.36.18:0/3882984371 client protocol 0
2018-12-26 19:51:46.206645 7fffee40c700 10
mon.cephmon00@0(leader) e40 _ms_dispatch new session
0x5555722bb980 MonSession(client.36398604
10.128.36.18:0/3882984371 is open , features 0x27018fb86aa42ada
(jewel)) features 0x27018fb86aa42ada
2018-12-26 19:51:46.206651 7fffee40c700 20
mon.cephmon00@0(leader) e40 caps
2018-12-26 19:51:46.206657 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 preprocess_query auth(proto
0 34 bytes epoch 0) from client.36398604
10.128.36.18:0/3882984371
2018-12-26 19:51:46.206660 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 prep_auth() blob_size=34
2018-12-26 19:51:46.206692 7fffee40c700 2
mon.cephmon00@0(leader) e40 send_reply 0x555562fee5a0
0x55556f398b40 auth_reply(proto 2 0 (0) Success) v1
2018-12-26 19:51:46.206867 7fffee40c700 20
mon.cephmon00@0(leader) e40 _ms_dispatch existing session
0x5555722bb980 for client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.206870 7fffee40c700 20
mon.cephmon00@0(leader) e40 caps
2018-12-26 19:51:46.206877 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 preprocess_query auth(proto
2 128 bytes epoch 0) from client.36398604
10.128.36.18:0/3882984371
2018-12-26 19:51:46.206882 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 prep_auth() blob_size=128
2018-12-26 19:51:46.206979 7fffee40c700 2
mon.cephmon00@0(leader) e40 send_reply 0x555562fee5a0
0x55571f9c4d80 auth_reply(proto 2 0 (0) Success) v1
2018-12-26 19:51:46.207178 7fffee40c700 20
mon.cephmon00@0(leader) e40 _ms_dispatch existing session
0x5555722bb980 for client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.207182 7fffee40c700 20
mon.cephmon00@0(leader) e40 caps allow *
2018-12-26 19:51:46.207184 7fffee40c700 20 is_capable
service=mon command= read on cap allow *
2018-12-26 19:51:46.207185 7fffee40c700 20 allow so far
, doing grant allow *
2018-12-26 19:51:46.207186 7fffee40c700 20 allow all
2018-12-26 19:51:46.207188 7fffee40c700 10
mon.cephmon00@0(leader) e40 handle_subscribe
mon_subscribe({mdsmap=72103+,monmap=41+,osdmap=1207891}) v2
2018-12-26 19:51:46.207194 7fffee40c700 10
mon.cephmon00@0(leader) e40 handle_subscribe: MDS sub 'mdsmap'
2018-12-26 19:51:46.207195 7fffee40c700 20 is_capable
service=mds command= read on cap allow *
2018-12-26 19:51:46.207197 7fffee40c700 20 allow so far
, doing grant allow *
2018-12-26 19:51:46.207198 7fffee40c700 20 allow all
2018-12-26 19:51:46.207200 7fffee40c700 20
mon.cephmon00@0(leader).mds e72102 check_sub: mdsmap
2018-12-26 19:51:46.207204 7fffee40c700 10
mon.cephmon00@0(leader).monmap v40 check_sub monmap next 41 have
40
2018-12-26 19:51:46.207207 7fffee40c700 20 is_capable
service=osd command= read on cap allow *
2018-12-26 19:51:46.207208 7fffee40c700 20 allow so far
, doing grant allow *
2018-12-26 19:51:46.207209 7fffee40c700 20 allow all
2018-12-26 19:51:46.207211 7fffee40c700 10
mon.cephmon00@0(leader).osd e1208017 check_osdmap_sub
0x55592f513d00 next 1207891 (onetime)
2018-12-26 19:51:46.207213 7fffee40c700 5
mon.cephmon00@0(leader).osd e1208017 send_incremental
[1207891..1208017] to client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.207217 7fffee40c700 10
mon.cephmon00@0(leader).osd e1208017 build_incremental
[1207891..1207930] with features 27018fb86aa42ada
2018-12-26 19:51:46.220019 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 reencode_incremental_map
1207930 with features 504412504116439552
2018-12-26 19:51:46.230217 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 build_incremental inc
1207930 1146701 bytes
2018-12-26 19:51:46.230349 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 reencode_incremental_map
1207929 with features 504412504116439552
2018-12-26 19:51:46.232523 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 build_incremental inc
1207929 175613 bytes
... a lot more of reencode_incremental stuff ...
2018-12-26 19:51:46.745394 7fffee40c700 10
mon.cephmon00@0(leader) e40 ms_handle_reset 0x5555637cf800
10.128.36.18:0/3882984371
2018-12-26 19:51:46.745395 7ffff0c11700 10
mon.cephmon00@0(leader).log v79246823 encode_full log v 79246823
2018-12-26 19:51:46.745469 7ffff0c11700 10
mon.cephmon00@0(leader).log v79246823 encode_pending v79246824
2018-12-26 19:51:46.745763 7fffee40c700 10
mon.cephmon00@0(leader) e40 reset/close on session
client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.745769 7fffee40c700 10
mon.cephmon00@0(leader) e40 remove_session 0x5555722bb980
client.36398604 10.128.36.18:0/3882984371 features
0x27018fb86aa42ada
Any pointers to what to do here?
Andras
|