Follow up on the tell hanging: iterating over all osds and trying to raise the max-backfills gives hanging ceph tell processes like this: root 1007846 15.3 1.2 918388 50972 pts/5 Sl 00:03 0:48 /usr/bin/python3 /usr/bin/ceph tell osd.4 injectargs --osd-max-backfill root 1007890 0.4 0.9 850664 37596 pts/5 Sl 00:03 0:01 /usr/bin/python3 /usr/bin/ceph tell osd.7 injectargs --osd-max-backfill root 1007930 0.3 0.9 842472 37484 pts/5 Sl 00:03 0:01 /usr/bin/python3 /usr/bin/ceph tell osd.11 injectargs --osd-max-backfil root 1007987 0.3 0.9 850668 37540 pts/5 Sl 00:03 0:01 /usr/bin/python3 /usr/bin/ceph tell osd.18 injectargs --osd-max-backfil root 1008054 0.4 0.9 850664 37600 pts/5 Sl 00:03 0:01 /usr/bin/python3 /usr/bin/ceph tell osd.29 injectargs --osd-max-backfil root 1008147 14.7 1.2 910192 50648 pts/5 Sl 00:03 0:42 /usr/bin/python3 /usr/bin/ceph tell osd.33 injectargs --osd-max-backfil root 1008205 0.3 0.9 842468 37524 pts/5 Sl 00:03 0:01 /usr/bin/python3 /usr/bin/ceph tell osd.45 injectargs --osd-max-backfil root 1008246 0.3 0.9 850664 37828 pts/5 Sl 00:04 0:01 /usr/bin/python3 /usr/bin/ceph tell osd.48 injectargs --osd-max-backfil ... Additionally many of the tell processes get into an infinite loop and print this error over and over again: 2020-09-23 00:09:48.766 7f07e5f99700 0 --1- [2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c8055680 0x7f07c8053740 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2020-09-23 00:09:48.774 7f07e5f99700 0 --1- [2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c804f590 0x7f07c80505c0 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2020-09-23 00:09:48.786 7f07e5f99700 0 --1- [2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c8055680 0x7f07c8053740 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2020-09-23 00:09:48.790 7f07e5f99700 0 --1- [2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c804f590 0x7f07c80505c0 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZER 2020-09-23 00:09:48.798 7f07e5f99700 0 --1- [2a0a:e5c0:2:1:20d:b9ff:fe48:3bd4]:0/2338294673 >> v1:[2a0a:e5c0:2:1:21b:21ff:febc:5060]:6858/12824 conn(0x7f07c8055680 0x7f07c8053740 :-1 s=CONNECTING_SEND_CONNECT_MSG pgs=0 cs=0 l=1).handle_connect_reply_2 connect got BADAUTHORIZ Nico Schottelius <nico.schottelius@xxxxxxxxxxx> writes: > So the same problem happens with pgs which are in "unknown" state, > > [19:31:08] black2.place6:~# ceph pg 2.5b2 query | tee query_2.5b2 > > hangs until the pg actually because active again. I assume that this > should not be the case, should it? > > > Nico Schottelius <nico.schottelius@xxxxxxxxxxx> writes: > >> Update to the update: currently debugging why pgs are stuck in the >> peering state: >> >> [18:57:49] black2.place6:~# ceph pg dump all | grep 2.7d1 >> dumped all >> 2.7d1 16666 0 0 0 0 69698617344 0 0 3002 3002 peering 2020-09-22 18:49:28.587859 80407'8126117 80915:35142541 [22,84] 22 [22,84] 22 80407'8126117 2020-09-22 17:23:11.860334 79594'8122364 2020-09-21 13:27:16.376009 0 >> >> The problem is that >> >> ceph pg 2.7d1 query >> >> hangs and does not output information. Does anyone know what could be >> the cause for this? -- Modern, affordable, Swiss Virtual Machines. Visit www.datacenterlight.ch _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx