An update on our cephfs kernel client troubles. After doing some
heavier testing with a newer kernel 4.19.13, it seems like it also
gets into a bad state when it can't connect to monitors (all back
end processes are on 12.2.8):
Jan 15 08:49:00 mon5 kernel: libceph: mon1 10.128.150.11:6789
session established
Jan 15 08:49:01 mon5 kernel: libceph: mon1 10.128.150.11:6789 io
error
Jan 15 08:49:01 mon5 kernel: libceph: mon1 10.128.150.11:6789
session lost, hunting for new mon
Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
session established
Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789 io
error
Jan 15 08:49:01 mon5 kernel: libceph: mon0 10.128.150.10:6789
session lost, hunting for new mon
Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789
session established
Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789 io
error
Jan 15 08:49:04 mon5 kernel: libceph: mon1 10.128.150.11:6789
session lost, hunting for new mon
Jan 15 08:49:06 mon5 kernel: libceph: mon2 10.128.150.12:6789
session established
Jan 15 08:49:07 mon5 kernel: libceph: mon2 10.128.150.12:6789 io
error
Jan 15 08:49:07 mon5 kernel: libceph: mon2 10.128.150.12:6789
session lost, hunting for new mon
... repeating forever ...
# uname -r
4.19.13
and on the mon node (10.128.150.10) at log level 20, I see that it
is building/encoding a lot of maps (10.128.36.18 is the client in
question):
2019-01-15 08:49:01.355017 7fffee40c700 10 mon.cephmon00@0(leader)
e40 _ms_dispatch new session 0x555562dc6c00
MonSession(client.36800361 10.128.36.18:0/2911716500 is open ,
features 0x27018fb86aa42ada (jewel)) features 0x27018fb86aa42ada
2019-01-15 08:49:01.355021 7fffee40c700 20 mon.cephmon00@0(leader)
e40 caps
2019-01-15 08:49:01.355026 7fffee40c700 10
mon.cephmon00@0(leader).auth v58457 preprocess_query auth(proto 0 34
bytes epoch 0) from client.36800361 10.128.36.18:0/2911716500
<snip>
2019-01-15 08:49:01.355817 7fffee40c700 10
mon.cephmon00@0(leader).osd e1254390 check_osdmap_sub 0x555565373340
next 1254102 (onetime)
2019-01-15 08:49:01.355819 7fffee40c700 5
mon.cephmon00@0(leader).osd e1254390 send_incremental
[1254102..1254390] to client.36800361 10.128.36.18:0/2911716500
2019-01-15 08:49:01.355821 7fffee40c700 10
mon.cephmon00@0(leader).osd e1254390 build_incremental
[1254102..1254141] with features 27018fb86aa42ada
2019-01-15 08:49:01.364859 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map
1254141 with features 504412504116439552
2019-01-15 08:49:01.372131 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 build_incremental inc
1254141 1237271 bytes
2019-01-15 08:49:01.372180 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map
1254140 with features 504412504116439552
2019-01-15 08:49:01.372187 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 build_incremental inc
1254140 260 bytes
2019-01-15 08:49:01.380981 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map
1254139 with features 504412504116439552
2019-01-15 08:49:01.387983 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 build_incremental inc
1254139 1237351 bytes
2019-01-15 08:49:01.388043 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map
1254138 with features 504412504116439552
2019-01-15 08:49:01.388049 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 build_incremental inc
1254138 232 bytes
2019-01-15 08:49:01.396781 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map
1254137 with features 504412504116439552
<snip> ... a lot more of similar messages
2019-01-15 08:49:04.210936 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 reencode_incremental_map
1254382 with features 504412504116439552
2019-01-15 08:49:04.211032 7fffee40c700 20
mon.cephmon00@0(leader).osd e1254390 build_incremental inc
1254382 232 bytes
2019-01-15 08:49:04.211066 7fffee40c700 10 mon.cephmon00@0(leader)
e40 ms_handle_reset 0x55556450f800 10.128.36.18:0/2911716500
2019-01-15 08:49:04.211070 7fffee40c700 10 mon.cephmon00@0(leader)
e40 reset/close on session client.36800361 10.128.36.18:0/2911716500
2019-01-15 08:49:04.211073 7fffee40c700 10 mon.cephmon00@0(leader)
e40 remove_session 0x555562dc6c00 client.36800361
10.128.36.18:0/2911716500 features 0x27018fb86aa42ada
looks like the client disconnects (either for waiting too long, or
for some protocol error?). Any hints on why so many maps need to be
reencoded (to jewel), or how to improve this behavior would be much
appreciated. We would really be interested in using the kernel
client instead of fuse, but this seems to be a stumbling block.
Thanks,
Andras
On 1/3/19 6:49 AM, Andras Pataki wrote:
I wonder if anyone could offer any insight on the issue below,
regarding the CentOS 7.6 kernel cephfs client connecting to a
Luminous cluster. I have since tried a much newer 4.19.13 kernel,
which did not show the same issue (but unfortunately for various
reasons unrelated to ceph, we can't go to such a new kernel).
Am I reading it right that somehow the monitor thinks this kernel
is old and needs to prepare special maps in some older format for
it, and that takes too long and the kernel just gives up, or
perhaps has some other communication protocol error? It seems
like one of these mon communication sessions only lasts half a
second. Then it reconnects to another mon, and gets the same
result, etc. Any way around this?
Andras
On 12/26/18 7:55 PM, Andras Pataki
wrote:
We've been using ceph-fuse with a pretty good stability record
(against the Luminous 12.2.8 back end). Unfortunately ceph-fuse
has extremely poor small file performance (understandably), so
we've been testing the kernel client. The latest RedHat kernel
3.10.0-957.1.3.el7.x86_64 seems to work pretty well, as long as
the cluster is running in a completely clean state. However, it
seems that as soon as there is something happening to the
cluster, the kernel client crashes pretty badly.
Today's example: I've reweighted some OSDs to balance the disk
usage a bit (set nobackfill, reweight the OSDs, check the new
hypothetical space usage, then unset nobackfill). As soon as
the reweighting procedure started, the kernel client went into
an infinite loop trying to unsuccessfully connect to mons:
Dec 26 19:28:53 mon5 kernel: libceph: mon0
10.128.150.10:6789 io error
Dec 26 19:28:53 mon5 kernel: libceph: mon0
10.128.150.10:6789 session lost, hunting for new mon
Dec 26 19:28:53 mon5 kernel: libceph: mon2
10.128.150.12:6789 session established
Dec 26 19:28:58 mon5 kernel: libceph: mon2
10.128.150.12:6789 io error
Dec 26 19:28:58 mon5 kernel: libceph: mon2
10.128.150.12:6789 session lost, hunting for new mon
Dec 26 19:28:58 mon5 kernel: libceph: mon1
10.128.150.11:6789 session established
Dec 26 19:28:59 mon5 kernel: libceph: mon1
10.128.150.11:6789 io error
Dec 26 19:28:59 mon5 kernel: libceph: mon1
10.128.150.11:6789 session lost, hunting for new mon
Dec 26 19:28:59 mon5 kernel: libceph: mon2
10.128.150.12:6789 session established
Dec 26 19:28:59 mon5 kernel: libceph: mon2
10.128.150.12:6789 io error
Dec 26 19:28:59 mon5 kernel: libceph: mon2
10.128.150.12:6789 session lost, hunting for new mon
Dec 26 19:28:59 mon5 kernel: libceph: mon0
10.128.150.10:6789 session established
Dec 26 19:29:00 mon5 kernel: libceph: mon0
10.128.150.10:6789 io error
Dec 26 19:29:00 mon5 kernel: libceph: mon0
10.128.150.10:6789 session lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon2
10.128.150.12:6789 session established
Dec 26 19:29:00 mon5 kernel: libceph: mon2
10.128.150.12:6789 io error
Dec 26 19:29:00 mon5 kernel: libceph: mon2
10.128.150.12:6789 session lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon1
10.128.150.11:6789 session established
Dec 26 19:29:00 mon5 kernel: libceph: mon1
10.128.150.11:6789 io error
Dec 26 19:29:00 mon5 kernel: libceph: mon1
10.128.150.11:6789 session lost, hunting for new mon
Dec 26 19:29:00 mon5 kernel: libceph: mon2
10.128.150.12:6789 session established
Dec 26 19:29:01 mon5 kernel: libceph: mon2
10.128.150.12:6789 io error
Dec 26 19:29:01 mon5 kernel: libceph: mon2
10.128.150.12:6789 session lost, hunting for new mon
Dec 26 19:29:01 mon5 kernel: libceph: mon0
10.128.150.10:6789 session established
Dec 26 19:29:01 mon5 kernel: libceph: mon0
10.128.150.10:6789 io error
Dec 26 19:29:01 mon5 kernel: libceph: mon0
10.128.150.10:6789 session lost, hunting for new mon
Dec 26 19:29:01 mon5 kernel: libceph: mon2
10.128.150.12:6789 session established
Dec 26 19:29:02 mon5 kernel: libceph: mon2
10.128.150.12:6789 io error
Dec 26 19:29:02 mon5 kernel: libceph: mon2
10.128.150.12:6789 session lost, hunting for new mon
Dec 26 19:29:02 mon5 kernel: libceph: mon1
10.128.150.11:6789 session established
Dec 26 19:29:02 mon5 kernel: libceph: mon1
10.128.150.11:6789 io error
Dec 26 19:29:02 mon5 kernel: libceph: mon1
10.128.150.11:6789 session lost, hunting for new mon
... etc ...
seemingly never recovering. The cluster is healthy, all other
clients are successfully doing I/O:
[root@cephmon00 ceph]# ceph -s
cluster:
id: d7b33135-0940-4e48-8aa6-1d2026597c2f
health: HEALTH_WARN
noout flag(s) set
1 backfillfull osd(s)
4 pool(s) backfillfull
239119058/12419244975 objects misplaced
(1.925%)
services:
mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
mgr: cephmon00(active)
mds: cephfs-1/1/1 up {0=cephmds00=up:active}, 1
up:standby
osd: 3534 osds: 3534 up, 3534 in; 5040 remapped pgs
flags noout
data:
pools: 5 pools, 50688 pgs
objects: 2.51G objects, 7.77PiB
usage: 19.1PiB used, 5.90PiB / 25.0PiB avail
pgs: 239119058/12419244975 objects misplaced
(1.925%)
45639 active+clean
2914 active+remapped+backfilling
2126 active+remapped+backfill_wait
9 active+clean+scrubbing+deep
io:
client: 10.3MiB/s rd, 2.23GiB/s wr, 32op/s rd,
3.77kop/s wr
recovery: 114GiB/s, 33.12kobjects/s
The client machine in question is otherwise healthy also (not
out of memory, etc.). I've checked the osd blacklist, nothing
on that. Nothing suspicious in the mon logs regarding this
client either. Setting one of the mons to log level 20, the
relevant section when this client tries to connect:
2018-12-26 19:51:46.203264 7fffeb406700 10
mon.cephmon00@0(leader) e40 ms_verify_authorizer
10.128.36.18:0/3882984371 client protocol 0
2018-12-26 19:51:46.206645 7fffee40c700 10
mon.cephmon00@0(leader) e40 _ms_dispatch new session
0x5555722bb980 MonSession(client.36398604
10.128.36.18:0/3882984371 is open , features
0x27018fb86aa42ada (jewel)) features 0x27018fb86aa42ada
2018-12-26 19:51:46.206651 7fffee40c700 20
mon.cephmon00@0(leader) e40 caps
2018-12-26 19:51:46.206657 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 preprocess_query
auth(proto 0 34 bytes epoch 0) from client.36398604
10.128.36.18:0/3882984371
2018-12-26 19:51:46.206660 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 prep_auth() blob_size=34
2018-12-26 19:51:46.206692 7fffee40c700 2
mon.cephmon00@0(leader) e40 send_reply 0x555562fee5a0
0x55556f398b40 auth_reply(proto 2 0 (0) Success) v1
2018-12-26 19:51:46.206867 7fffee40c700 20
mon.cephmon00@0(leader) e40 _ms_dispatch existing session
0x5555722bb980 for client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.206870 7fffee40c700 20
mon.cephmon00@0(leader) e40 caps
2018-12-26 19:51:46.206877 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 preprocess_query
auth(proto 2 128 bytes epoch 0) from client.36398604
10.128.36.18:0/3882984371
2018-12-26 19:51:46.206882 7fffee40c700 10
mon.cephmon00@0(leader).auth v57602 prep_auth() blob_size=128
2018-12-26 19:51:46.206979 7fffee40c700 2
mon.cephmon00@0(leader) e40 send_reply 0x555562fee5a0
0x55571f9c4d80 auth_reply(proto 2 0 (0) Success) v1
2018-12-26 19:51:46.207178 7fffee40c700 20
mon.cephmon00@0(leader) e40 _ms_dispatch existing session
0x5555722bb980 for client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.207182 7fffee40c700 20
mon.cephmon00@0(leader) e40 caps allow *
2018-12-26 19:51:46.207184 7fffee40c700 20 is_capable
service=mon command= read on cap allow *
2018-12-26 19:51:46.207185 7fffee40c700 20 allow so
far , doing grant allow *
2018-12-26 19:51:46.207186 7fffee40c700 20 allow all
2018-12-26 19:51:46.207188 7fffee40c700 10
mon.cephmon00@0(leader) e40 handle_subscribe
mon_subscribe({mdsmap=72103+,monmap=41+,osdmap=1207891}) v2
2018-12-26 19:51:46.207194 7fffee40c700 10
mon.cephmon00@0(leader) e40 handle_subscribe: MDS sub 'mdsmap'
2018-12-26 19:51:46.207195 7fffee40c700 20 is_capable
service=mds command= read on cap allow *
2018-12-26 19:51:46.207197 7fffee40c700 20 allow so
far , doing grant allow *
2018-12-26 19:51:46.207198 7fffee40c700 20 allow all
2018-12-26 19:51:46.207200 7fffee40c700 20
mon.cephmon00@0(leader).mds e72102 check_sub: mdsmap
2018-12-26 19:51:46.207204 7fffee40c700 10
mon.cephmon00@0(leader).monmap v40 check_sub monmap next 41
have 40
2018-12-26 19:51:46.207207 7fffee40c700 20 is_capable
service=osd command= read on cap allow *
2018-12-26 19:51:46.207208 7fffee40c700 20 allow so
far , doing grant allow *
2018-12-26 19:51:46.207209 7fffee40c700 20 allow all
2018-12-26 19:51:46.207211 7fffee40c700 10
mon.cephmon00@0(leader).osd e1208017 check_osdmap_sub
0x55592f513d00 next 1207891 (onetime)
2018-12-26 19:51:46.207213 7fffee40c700 5
mon.cephmon00@0(leader).osd e1208017 send_incremental
[1207891..1208017] to client.36398604
10.128.36.18:0/3882984371
2018-12-26 19:51:46.207217 7fffee40c700 10
mon.cephmon00@0(leader).osd e1208017 build_incremental
[1207891..1207930] with features 27018fb86aa42ada
2018-12-26 19:51:46.220019 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 reencode_incremental_map
1207930 with features 504412504116439552
2018-12-26 19:51:46.230217 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 build_incremental inc
1207930 1146701 bytes
2018-12-26 19:51:46.230349 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 reencode_incremental_map
1207929 with features 504412504116439552
2018-12-26 19:51:46.232523 7fffee40c700 20
mon.cephmon00@0(leader).osd e1208017 build_incremental inc
1207929 175613 bytes
... a lot more of reencode_incremental stuff ...
2018-12-26 19:51:46.745394 7fffee40c700 10
mon.cephmon00@0(leader) e40 ms_handle_reset 0x5555637cf800
10.128.36.18:0/3882984371
2018-12-26 19:51:46.745395 7ffff0c11700 10
mon.cephmon00@0(leader).log v79246823 encode_full log v
79246823
2018-12-26 19:51:46.745469 7ffff0c11700 10
mon.cephmon00@0(leader).log v79246823 encode_pending v79246824
2018-12-26 19:51:46.745763 7fffee40c700 10
mon.cephmon00@0(leader) e40 reset/close on session
client.36398604 10.128.36.18:0/3882984371
2018-12-26 19:51:46.745769 7fffee40c700 10
mon.cephmon00@0(leader) e40 remove_session 0x5555722bb980
client.36398604 10.128.36.18:0/3882984371 features
0x27018fb86aa42ada
Any pointers to what to do here?
Andras
|