Re: Many ceph commands hang. broken mgr?

Rafael Lopez <rafael.lopez@xxxxxxxxxx> · Wed, 25 Nov 2020 10:56:20 +1100

Hi Paul,

We had similar experience with redhat ceph, and it turned out to be the mgr
progress module. I think there are some works to fix this, though the one I
thought would impact you seems to be in 14.2.11.
https://github.com/ceph/ceph/pull/36076

If you have 14.2.15, you can try turning off the progress module altogether
to see if it makes a difference.
https://docs.ceph.com/en/latest/releases/nautilus/
MGR: progress module can now be turned on/off, using the commands: ceph
progress on and ceph progress off.

Rafael

On Wed, 25 Nov 2020 at 06:04, Paul Mezzanini <pfmeec@xxxxxxx> wrote:

> Ever since we jumped from 14.2.9 to .12 (and beyond) a lot of the ceph
> commands just hang.  The mgr daemon also just stops responding to our
> Prometheus scrapes occasionally.  A daemon restart and it wakes back up.  I
> have nothing pointing to these being related but it feels that way.
>
> I also tried to get device health monitoring with smart up and running
> around that upgrade time.  It never seemed to be able to pull in and report
> on the health across the drives.  I did see the osd process firing off
> smartctl on occasion though so it was trying to do something.  Again, I
> have nothing pointing to this being related but it feels like it may be.
>
> Some commands that currently hang:
> ceph osd pool autoscale-status
> ceph balancer *
> ceph iostat (oddly, this spit out a line of all 0 stats once and then hung)
> ceph fs status
> toggling ceph device monitoring on or off and a lot of the device health
> stuff too
>
>
>
> Mgr logs on disk show flavors of this:
> 2020-11-24 13:05:07.883 7f19e2c40700  0 log_channel(audit) log [DBG] :
> from='mon.0 -' entity='mon.' cmd=[{,",p,r,e,f,i,x,",:, ,",o,s,d,
> ,p,e,r,f,",,, ,",f,o,r,m,a,t,",:, ,",j,s,o,n,",}]: dispatch
> 2020-11-24 13:05:07.895 7f19e2c40700  0 log_channel(audit) log [DBG] :
> from='mon.0 -' entity='mon.' cmd=[{,",p,r,e,f,i,x,",:, ,",o,s,d, ,p,o,o,l,
> ,s,t,a,t,s,",,, ,",f,o,r,m,a,t,",:, ,",j,s,o,n,",}]: dispatch
> 2020-11-24 13:05:08.567 7f19e1c3e700  0 log_channel(cluster) log [DBG] :
> pgmap v587: 17149 pgs: 1 active+remapped+backfill_wait, 2
> active+clean+scrubbing, 55 active+clean+scrubbing+deep, 9
> active+remapped+backfilling, 17082 active+clean; 2.1 PiB data, 3.5 PiB
> used, 2.9 PiB / 6.4 PiB avail; 108 MiB/s rd, 53 MiB/s wr, 1.20k op/s;
> 7525420/9900121381 objects misplaced (0.076%); 99 MiB/s, 40 objects/s
> recovering
>
> ceph status:
>   cluster:
>     id:     971a5242-f00d-421e-9bf4-5a716fcc843a
>     health: HEALTH_WARN
>             1 nearfull osd(s)
>             1 pool(s) nearfull
>
>   services:
>     mon: 3 daemons, quorum ceph-mon-01,ceph-mon-03,ceph-mon-02 (age 4h)
>     mgr: ceph-mon-01(active, since 97s), standbys: ceph-mon-03, ceph-mon-02
>     mds: cephfs:1 {0=ceph-mds-02=up:active} 3 up:standby
>     osd: 843 osds: 843 up (since 13d), 843 in (since 2w); 10 remapped pgs
>     rgw: 1 daemon active (ceph-rgw-01)
>
>   task status:
>     scrub status:
>         mds.ceph-mds-02: idle
>
>   data:
>     pools:   16 pools, 17149 pgs
>     objects: 1.61G objects, 2.1 PiB
>     usage:   3.5 PiB used, 2.9 PiB / 6.4 PiB avail
>     pgs:     6482000/9900825469 objects misplaced (0.065%)
>              17080 active+clean
>              54    active+clean+scrubbing+deep
>              9     active+remapped+backfilling
>              5     active+clean+scrubbing
>              1     active+remapped+backfill_wait
>
>   io:
>     client:   877 MiB/s rd, 1.8 GiB/s wr, 1.91k op/s rd, 3.33k op/s wr
>     recovery: 136 MiB/s, 55 objects/s
>
> ceph config dump:
> WHO                MASK LEVEL    OPTION
>      VALUE                                             RO
> global                  advanced cluster_network
>       192.168.42.0/24                                   *
> global                  advanced mon_max_pg_per_osd
>      400
> global                  advanced mon_pg_warn_max_object_skew
>       -1.000000
> global                  dev      mon_warn_on_pool_pg_num_not_power_of_two
>      false
> global                  advanced osd_max_backfills
>       2
> global                  advanced osd_max_scrubs
>      4
> global                  advanced osd_scrub_during_recovery
>       false
> global                  advanced public_network
>      1xx.xx.171.0/24 10.16.171.0/24                    *
>   mon                   advanced mon_allow_pool_delete
>       true
>   mgr                   advanced mgr/balancer/mode
>       none
>   mgr                   advanced mgr/devicehealth/enable_monitoring
>      false
>   osd                   advanced bluestore_compression_mode
>      passive
>   osd                   advanced
> osd_deep_scrub_large_omap_object_key_threshold 2000000
>
>   osd                   advanced osd_op_queue_cut_off
>      high                                              *
>   osd                   advanced osd_scrub_load_threshold
>      5.000000
>   mds                   advanced mds_beacon_grace
>      300.000000
>   mds                   basic    mds_cache_memory_limit
>      16384000000
>   mds                   advanced mds_log_max_segments
>      256
>   client                advanced rbd_default_features
>      5
>     client.libvirt      advanced admin_socket
>      /var/run/ceph/$cluster-$type.$id.$pid.$cctid.asok *
>     client.libvirt      basic    log_file
>      /var/log/ceph/qemu-guest-$pid.log                 *
>
>
> /etc/ceph/ceph.conf is the stub file with fsid and the mons listed.
> Yes I have a drive that just started to tickle the full warn limit.
> That's what pulled me back into the "I should fix this" mode.  I'm manually
> adjusting the weight on that one for the time being along with slowly
> lowering pg_num on an oversized pool.  The cluster still has this issue
> when in health_ok.
>
> I'm free to do a lot of debugging and poking around even though this is
> our production cluster.  The only service I refuse to play around with is
> the MDS.  That one bites back.  Does anyone have more ideas on where to
> look to try and figure out what's going on?
>
> --
> Paul Mezzanini
> Sr Systems Administrator / Engineer, Research Computing
> Information & Technology Services
> Finance & Administration
> Rochester Institute of Technology
> o:(585) 475-3245 | pfmeec@xxxxxxx
>
> CONFIDENTIALITY NOTE: The information transmitted, including attachments,
> is
> intended only for the person(s) or entity to which it is addressed and may
> contain confidential and/or privileged material. Any review,
> retransmission,
> dissemination or other use of, or taking of any action in reliance upon
> this
> information by persons or entities other than the intended recipient is
> prohibited. If you received this in error, please contact the sender and
> destroy any copies of this information.
> ------------------------
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>

-- 
*Rafael Lopez*
Devops Systems Engineer
Monash University eResearch Centre
E: rafael.lopez@xxxxxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx