Re: Ceph MDS randomly hangs when pg nums reduced

Dhairya Parmar <dparmar@xxxxxxxxxx> · Mon, 26 Feb 2024 13:22:02 +0530

Hi,

First thing that comes to my mind looking at the I/O section is the
increased load of metadata distribution causing bottlenecks to the MDSs,
but you also have 2 active ranks which could avert the problem but then it
could be possible that certain files/dirs get to do unreasonably high
amount of metadata I/O which could've lead to uneven distribution of
workload among those active MDSs therefore making one of them hang at
times. If the MDS is hanging it seems like yours is a metadata intensive
environment(like CephFS) and reducing PGs might not be a good idea here.
You could also share MDS logs to see what's going on exactly and if there
is something that needs attention.

--
*Dhairya Parmar*

Associate Software Engineer, CephFS
IBM, Inc.

On Fri, Feb 23, 2024 at 8:27 PM <lokitingyi@xxxxxxxxx> wrote:

> Hi,
>
> I have a CephFS cluster
> ```
> > ceph -s
>
>   cluster:
>     id:     e78987f2-ef1c-11ed-897d-cf8c255417f0
>     health: HEALTH_WARN
>             85 pgs not deep-scrubbed in time
>             85 pgs not scrubbed in time
>
>   services:
>     mon: 5 daemons, quorum
> datastone05,datastone06,datastone07,datastone10,datastone09 (age 2w)
>     mgr: datastone05.iitngk(active, since 2w), standbys: datastone06.wjppdy
>     mds: 2/2 daemons up, 1 hot standby
>     osd: 22 osds: 22 up (since 3d), 22 in (since 4w); 8 remapped pgs
>
>   data:
>     volumes: 1/1 healthy
>     pools:   4 pools, 115 pgs
>     objects: 49.08M objects, 16 TiB
>     usage:   35 TiB used, 2.0 PiB / 2.1 PiB avail
>     pgs:     3807933/98160678 objects misplaced (3.879%)
>              107 active+clean
>              8   active+remapped+backfilling
>
>   io:
>     client:   224 MiB/s rd, 79 MiB/s wr, 844 op/s rd, 33 op/s wr
>     recovery: 8.8 MiB/s, 24 objects/s
> ```
>
> The pool and pg status
>
> ```
> > ceph osd pool autoscale-status
>
> POOL                SIZE  TARGET SIZE  RATE  RAW CAPACITY   RATIO  TARGET
> RATIO  EFFECTIVE RATIO  BIAS  PG_NUM  NEW PG_NUM  AUTOSCALE  BULK
> cephfs.myfs.meta  28802M                2.0         2119T  0.0000
>                         4.0      16              on         False
> cephfs.myfs.data  16743G                2.0         2119T  0.0154
>                         1.0      32              on         False
> rbd                  19                 2.0         2119T  0.0000
>                         1.0      32              on         False
> .mgr               3840k                2.0         2119T  0.0000
>                         1.0       1              on         False
> ```
>
> The pool detail
>
> ```
> > ceph osd pool ls detail
>
> pool 1 'cephfs.myfs.meta' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 16 pgp_num 16 autoscale_mode on last_change
> 3639 lfor 0/3639/3637 flags hashpspool stripe_width 0 pg_autoscale_bias 4
> pg_num_min 16 recovery_priority 5 application cephfs
> pool 2 'cephfs.myfs.data' replicated size 2 min_size 1 crush_rule 0
> object_hash rjenkins pg_num 66 pgp_num 58 pg_num_target 32 pgp_num_target
> 32 autoscale_mode on last_change 5670 lfor 0/5661/5659 flags
> hashpspool,selfmanaged_snaps stripe_width 0 application cephfs
> pool 3 'rbd' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 32 pgp_num 32 autoscale_mode on last_change 486 lfor
> 0/486/478 flags hashpspool,selfmanaged_snaps stripe_width 0 application rbd
> pool 4 '.mgr' replicated size 2 min_size 1 crush_rule 0 object_hash
> rjenkins pg_num 1 pgp_num 1 autoscale_mode on last_change 39 flags
> hashpspool stripe_width 0 pg_num_max 32 pg_num_min 1 application mgr
> ```
>
> When pg numbers reduce, the mds server would have a chance to hang.
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
>
>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx