Re: Nautilus 14.2.19 mon 100% CPU

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Apr 9, 2021 at 8:39 PM Robert LeBlanc <robert@xxxxxxxxxxxxx> wrote:
>
> On Fri, Apr 9, 2021 at 11:49 AM Dan van der Ster <dan@xxxxxxxxxxxxxx> wrote:
> >
> > Thanks. I didn't see anything ultra obvious to me.
> >
> > But I did notice the nearfull warnings so I wonder if this cluster is
> > churning through osdmaps? Did you see a large increase in inbound or
> > outbound network traffic on this mon following the upgrade?
> > Totally speculating here, but maybe there is an issue where you have
> > some old clients, which can't decode an incremental osdmap from a
> > nautilus mon, so the single mon is busy serving up these maps to the
> > clients.
> >
> > Does the mon load decrease if you stop the osdmap churn?, e.g. by
> > setting norebalance if that is indeed ongoing.
> >
> > Could you also share debug_ms = 1 for a minute of busy cpu mon?
>
> Here are the new logs with the debug_ms=1 for a bit.
> https://owncloud.leblancnet.us/owncloud/index.php/s/1hvtJo3s2oLPpWn

Something strange in this is there is one hammer client that is asking
for nearly a million incremental osdmaps, seemingly every 30s:

    client.131831153 at 172.16.212.55 is asking for incrementals from
1170448..1987355 (see [1])

Can you try to evict/kill/block that client and see if your mon load drops?

-- dan

[1]

   -43> 2021-04-09 13:12:37.032 7f50de246700  5
mon.sun-storemon01@0(leader).osd e1987341 send_incremental
[1170448..1987341] to client.131831153
2021-04-09 17:07:27.238 7f9fc83e4700 10 mon.sun-storemon01@0(leader)
e45 handle_subscribe
mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
2021-04-09 17:07:27.238 7f9fc83e4700 10
mon.sun-storemon01@0(leader).osd e1987355 check_osdmap_sub
0x55e2e2133de0 next 1170448 (onetime)
2021-04-09 17:07:27.238 7f9fc83e4700  5
mon.sun-storemon01@0(leader).osd e1987355 send_incremental
[1170448..1987355] to client.131831153
2021-04-09 17:07:50.910 7f9fc83e4700  5 mon.sun-storemon01@0(leader)
e45 dispatch_op client.131831153 v1:172.16.212.55:0/527701465 is not
authenticated, dropping
mon_subscribe({mdsmap=3914079+,monmap=0+,osdmap=1170448})
2021-04-09 18:14:47.295 7f9fc83e4700  1 --
[v2:10.65.7.203:3300/0,v1:10.65.7.203:6789/0] <== client.131831153
v1:172.16.212.55:0/527701465 3 ====
mon_subscribe({mdsmap=3914127+,monmap=0+,osdmap=1170448}) ==== 85+0+0
(unknown 1413914345 0 0) 0x55e2dbc52c00 con 0x55e2e1cf5680
2021-04-09 18:15:17.006 7f9fc83e4700  1 --
[v2:10.65.7.203:3300/0,v1:10.65.7.203:6789/0] <== client.131831153
v1:172.16.212.55:0/527701465 2 ====
mon_subscribe({mdsmap=3914127+,monmap=0+,osdmap=1170448}) ==== 85+0+0
(unknown 1413914345 0 0) 0x55e2da565200 con 0x55e2df00a880
2021-04-09 18:15:17.278 7f9fc83e4700  1 --
[v2:10.65.7.203:3300/0,v1:10.65.7.203:6789/0] <== client.131831153
v1:172.16.212.55:0/527701465 3 ====
mon_subscribe({mdsmap=3914127+,monmap=0+,osdmap=1170448}) ==== 85+0+0
(unknown 1413914345 0 0) 0x55e2de443000 con 0x55e2ee3d8400
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux