Re: Ceph 12.2.4 MGR spams syslog with "mon failed to return metadata for mds"

Charles Alva <charlesalva@xxxxxxxxx> · Wed, 25 Apr 2018 19:42:08 +0700

Hi John,

The "ceph mds metadata mds1" produced "Error ENOENT:". Querying mds metadata to mds2 and mds3 worked as expected. It seemed, only the active MDS could not be queried by Ceph MGR.

I also stated wrong that Ceph MGR spamming the syslog, it should be the ceph-mgr log itself, sorry for the confusion.

# ceph -s
  cluster:
    id:     b63f4ca1-f5e1-4ac1-a6fc-5ab70c65864a
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum mon1,mon2,mon3
    mgr: mon1(active), standbys: mon2, mon3
    mds: cephfs-1/1/1 up  {0=mds1=up:active}, 2 up:standby
    osd: 14 osds: 14 up, 14 in
    rgw: 3 daemons active

  data:
    pools:   10 pools, 248 pgs
    objects: 583k objects, 2265 GB
    usage:   6816 GB used, 6223 GB / 13039 GB avail
    pgs:     247 active+clean
             1   active+clean+scrubbing+deep

  io:
    client:   115 kB/s rd, 759 kB/s wr, 22 op/s rd, 24 op/s wr

# ceph mds metadata mds1
Error ENOENT:

# ceph mds metadata mds2
{
    "addr": "10.100.100.115:6800/1861195236",
    "arch": "x86_64",
    "ceph_version": "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)",
    "cpu": "Intel(R) Xeon(R) CPU E3-1230 V2 @ 3.30GHz",
    "distro": "ubuntu",
    "distro_description": "Ubuntu 16.04.4 LTS",
    "distro_version": "16.04",
    "hostname": "mds2",
    "kernel_description": "#1 SMP PVE 4.13.13-40 (Fri, 16 Feb 2018 09:51:20 +0100)",
    "kernel_version": "4.13.13-6-pve",
    "mem_swap_kb": "2048000",
    "mem_total_kb": "2048000",
    "os": "Linux"
}

# ceph mds metadata mds3
{
    "addr": "10.100.100.116:6800/4180418633",
    "arch": "x86_64",
    "ceph_version": "ceph version 12.2.4 (52085d5249a80c5f5121a76d6288429f35e4e77b) luminous (stable)",
    "cpu": "Intel(R) Xeon(R) CPU E31240 @ 3.30GHz",
    "distro": "ubuntu",
    "distro_description": "Ubuntu 16.04.4 LTS",
    "distro_version": "16.04",
    "hostname": "mds3",
    "kernel_description": "#1 SMP PVE 4.13.16-47 (Mon, 9 Apr 2018 09:58:12 +0200)",
    "kernel_version": "4.13.16-2-pve",
    "mem_swap_kb": "4096000",
    "mem_total_kb": "2048000",
    "os": "Linux"
}

Kind regards,

Charles Alva
Sent from Gmail Mobile

On Tue, Apr 24, 2018 at 4:29 PM, John Spray <jspray@xxxxxxxxxx> wrote:
On Fri, Apr 20, 2018 at 11:29 AM, Charles Alva <charlesalva@xxxxxxxxx> wrote:

> Marc,

>

> Thanks.

>

> The mgr log spam occurs even without dashboard module enabled. I never

> checked the ceph mgr log before because the ceph cluster is always healthy.

> Based on the ceph mgr logs in syslog, the spam occurred long before and

> after I enabled the dashboard module.

>

>> # ceph -s

>>   cluster:

>>     id:     xxx

>>     health: HEALTH_OK

>>

>>   services:

>>     mon: 3 daemons, quorum mon1,mon2,mon3

>>     mgr: mon1(active), standbys: mon2, mon3

>>     mds: cephfs-1/1/1 up  {0=mds1=up:active}, 2 up:standby

>>     osd: 14 osds: 14 up, 14 in

>>     rgw: 3 daemons active

>>

>>   data:

>>     pools:   10 pools, 248 pgs

>>     objects: 546k objects, 2119 GB

>>     usage:   6377 GB used, 6661 GB / 13039 GB avail

>>     pgs:     248 active+clean

>>

>>   io:

>>     client:   25233 B/s rd, 1409 kB/s wr, 6 op/s rd, 59 op/s wr

>

>

>

> My ceph mgr log is spam with following log every second. This happens on 2

> separate Ceph 12.2.4 clusters.

(I assume that the mon, mgr and mds are all 12.2.4)

The "failed to return metadata" part is kind of mysterious.  Do you

also get an error if you try to do "ceph mds metadata mds1" by hand?

(that's what the mgr is trying to do).

If the metadata works when using the CLI by hand, you may have an

issue with the mgr's auth caps, check that its mon caps are set to

"allow profile mgr".

The "unhandled message" part is from a path where the mgr code is

ignoring messages from services that don't have any metadata (I think

this is actually a bug, as we should be considering these messages as

handled even if we're ignoring them).

John

>> # less +F /var/log/ceph/ceph-mgr.mon1.log

>>

>>  ...

>>

>> 2018-04-20 06:21:18.782861 7fca238ff700  1 mgr send_beacon active

>> 2018-04-20 06:21:19.050671 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897d1c00 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:19.051047 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:20.050889 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897eac00 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:20.051351 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:20.784455 7fca238ff700  1 mgr send_beacon active

>> 2018-04-20 06:21:21.050968 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897d0d00 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:21.051441 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:22.051254 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897ec100 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:22.051704 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:22.786656 7fca238ff700  1 mgr send_beacon active

>> 2018-04-20 06:21:23.051235 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897d0400 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:23.051712 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:24.051353 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897e6000 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:24.051971 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:24.788228 7fca238ff700  1 mgr send_beacon active

>> 2018-04-20 06:21:25.051642 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf897d1900 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:25.052182 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> 2018-04-20 06:21:26.051641 7fca14809700  0 ms_deliver_dispatch: unhandled

>> message 0x55bf89835600 mgrreport(mds.mds1 +24-0 packed 214) v5 from mds.0

>> 10.100.100.114:6800/4132681434

>> 2018-04-20 06:21:26.052169 7fca25102700  1 mgr finish mon failed to return

>> metadata for mds.mds1: (2) No such file or directory

>> ...

>

>

> Kind regards,

>

> Charles Alva

> Sent from Gmail Mobile

>

> On Fri, Apr 20, 2018 at 10:57 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>

> wrote:

>>

>>

>> Hi Charles,

>>

>> I am more or less responding to your syslog issue. I don’t have the

>> experience on cephfs to give you a reliable advice. So lets wait for the

>> experts to reply. But I guess you have to give a little more background

>> info, like

>>

>> This happened to running cluster, you didn’t apply any changes to?

>> Looks like your dashboard issue is not related to "1 mgr finish mon

>> failed to return metadata for mds.mds1"

>>

>>

>> -----Original Message-----

>> From: Charles Alva [mailto:charlesalva@xxxxxxxxx]

>> Sent: vrijdag 20 april 2018 10:33

>> To: Marc Roos

>> Cc: ceph-users

>> Subject: Re:  Ceph 12.2.4 MGR spams syslog with "mon failed

>> to return metadata for mds"

>>

>> Hi Marc,

>>

>> I'm using CephFS and mgr could not get the metadata of the mds. I

>> enabled the dashboard module and everytime I visit the ceph filesystem

>> page, it got internal error 500.

>>

>> Kind regards,

>>

>> Charles Alva

>> Sent from Gmail Mobile

>>

>>

>> On Fri, Apr 20, 2018 at 9:24 AM, Marc Roos <M.Roos@xxxxxxxxxxxxxxxxx>

>> wrote:

>>

>>

>>

>>         Remote syslog server, and buffering writes to the log?

>>

>>

>>         Actually this is another argument to fix logging to syslog a bit,

>>         because the default syslog is also be set to throttle and group

>> the

>>

>>         messages like:

>>

>>         Mar 9 17:59:35 db1 influxd: last message repeated 132 times

>>

>>

>>

>> https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg45025.htm

>> l

>> <https://www.mail-archive.com/ceph-users@xxxxxxxxxxxxxx/msg45025.html>

>>

>>

>>

>>

>>

>>         -----Original Message-----

>>         From: Charles Alva [mailto:charlesalva@xxxxxxxxx]

>>         Sent: vrijdag 20 april 2018 8:08

>>         To: ceph-users@xxxxxxxxxxxxxx

>>         Subject:  Ceph 12.2.4 MGR spams syslog with "mon

>> failed

>> to

>>         return metadata for mds"

>>

>>         Hi All,

>>

>>         Just noticed on 2 Ceph Luminous 12.2.4 clusters, Ceph mgr spams

>> the

>>

>>         syslog with lots of "mon failed to return metadata for mds" every

>>         second.

>>

>>         ```

>>         2018-04-20 06:06:03.951412 7fca238ff700  1 mgr send_beacon active

>>         2018-04-20 06:06:04.934477 7fca14809700  0 ms_deliver_dispatch:

>>         unhandled message 0x55bf897f0a00 mgrreport(mds.mds1 +24-0 packed

>> 214) v5

>>         from mds.0 10.100.100.114:6800/4132681434 2018-04-20

>> 06:06:04.934937

>>         7fca25102700  1 mgr finish mon failed to return metadata for

>> mds.mds1:

>>         (2) No such file or directory ```

>>

>>         How to fix this issue? or disable it completely to reduce disk IO

>> and

>>         increase SSD life span?

>>

>>

>>

>>         Kind regards,

>>

>>         Charles Alva

>>         Sent from Gmail Mobile

>>

>>

>>

>>

>>

>>

>>

>

>

> _______________________________________________

> ceph-users mailing list

> ceph-users@xxxxxxxxxxxxxx

> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com