Nautilus: PGs stuck remapped+backfilling

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi all,

I have a strange issue with backfilling and I'm not sure what the cause is.
It's a Nautilus cluster (upgraded) that has an SSD cache tier for OpenStack and CephFS metadata residing on the same SSDs, there were three SSDs in total. Today I added two new SSDs (NVMe) (osd.15, osd.16) to be able to shutoff one old server that has only one SSD-OSD left (osd.20). Setting the crush weight of osd.20 to 0 (and adjusting the weight of the remaining SSDs for an even distribution) leaves 3 PGs in active+remapped+backfilling state. I don't understand why the remaining PGs aren't backfilling, the crush rule is quite simple (all ssd pools are replicated with size 3). The backfilling PGs are all from the cephfs_metadata pool. Although there are 4 SSDs for 3 replicas the backfilling still should finish, right?

Can anyone share their thoughts why 3 PGs can't be recovered? If more information about the cluster is required please let me know.

Regards,
Eugen


ceph01:~ # ceph osd pool ls detail | grep meta
pool 36 'cephfs-metadata' replicated size 3 min_size 2 crush_rule 1 object_hash rjenkins pg_num 16 pgp_num 16 last_change 283362 flags hashpspool,nodelete,nodeep-scrub stripe_width 0 application cephfs


ceph01:~ # ceph pg dump | grep remapp
dumped all
36.b 28306 0 0 28910 0 8388608 101408323 219497 3078 3078 active+remapped+backfilling 2019-10-10 13:36:27.427527 284595'98565869 284595:254216941 [15,16,9] 15 [20,9,10] 20 284427'98489406 2019-10-10 00:16:02.682911 284089'98003598 2019-10-06 16:03:27.558267 0 36.d 28087 0 0 25327 0 26375382 106722204 231020 3041 3041 active+remapped+backfilling 2019-10-10 13:36:27.404739 284595'97933905 284595:252878816 [16,15,9] 16 [20,9,10] 20 284427'97887652 2019-10-10 04:13:29.371905 284259'97502135 2019-10-07 20:06:43.304593 0 36.4 28060 0 0 28406 0 8389242 104059103 225188 3061 3061 active+remapped+backfilling 2019-10-10 13:36:27.440390 284595'105299618 284595:312976619 [16,9,15] 16 [20,9,10] 20 284427'105218591 2019-10-10 00:18:07.924006 284089'104696098 2019-10-06 16:20:17.123149 0


rule ssd_ruleset {
        id 1
        type replicated
        min_size 1
        max_size 10
        step take default class ssd
        step chooseleaf firstn 0 type host
        step emit
}

This is the relevant part of the osd tree:

ceph01:~ #  ceph osd tree
ID  CLASS WEIGHT   TYPE NAME             STATUS REWEIGHT PRI-AFF
 -1       34.21628 root default
-31       11.25406     host ceph01
 25   hdd  3.59999         osd.25            up  1.00000 1.00000
 26   hdd  3.59999         osd.26            up  1.00000 1.00000
 27   hdd  3.59999         osd.27            up  1.00000 1.00000
 15   ssd  0.45409         osd.15            up  1.00000 1.00000
-34       11.25406     host ceph02
  0   hdd  3.59999         osd.0             up  1.00000 1.00000
 28   hdd  3.59999         osd.28            up  1.00000 1.00000
 29   hdd  3.59999         osd.29            up  1.00000 1.00000
 16   ssd  0.45409         osd.16            up  1.00000 1.00000
-37       10.79999     host ceph03
 31   hdd  3.59999         osd.31            up  1.00000 1.00000
 32   hdd  3.59999         osd.32            up  1.00000 1.00000
 33   hdd  3.59999         osd.33            up  1.00000 1.00000
-24        0.45409     host san01-ssd
 10   ssd  0.45409         osd.10            up  1.00000 1.00000
-23        0.45409     host san02-ssd
  9   ssd  0.45409         osd.9             up  1.00000 1.00000
-22              0     host san03-ssd
 20   ssd        0         osd.20            up  1.00000 1.00000


Don't be confused because of the '-ssd' suffix, we're using crush location hooks.
This is the current PG distribution on the SSDs:

ceph01:~ # ceph osd df | grep -E "^15 |^16 |^ 9|^10 |^20 "
15 ssd 0.45409 1.00000 465 GiB 34 GiB 32 GiB 1.2 GiB 857 MiB 431 GiB 7.29 0.22 27 up 16 ssd 0.45409 1.00000 465 GiB 37 GiB 34 GiB 1.5 GiB 964 MiB 428 GiB 7.87 0.23 31 up 10 ssd 0.45409 1.00000 745 GiB 27 GiB 25 GiB 1.7 GiB 950 MiB 718 GiB 3.65 0.11 29 up 9 ssd 0.45409 1.00000 745 GiB 34 GiB 32 GiB 1.3 GiB 902 MiB 711 GiB 4.60 0.14 30 up 20 ssd 0 1.00000 894 GiB 8.2 GiB 4.3 GiB 1.5 GiB 2.4 GiB 886 GiB 0.91 0.03 3 up


Current ceph status:

ceph01:~ #  ceph -s
  cluster:
    id:     655cb05a-435a-41ba-83d9-8549f7c36167
    health: HEALTH_OK

  services:
    mon: 3 daemons, quorum ceph01,ceph02,ceph03 (age 2d)
    mgr: ceph03(active, since 8d), standbys: ceph01, ceph02
    mds: cephfs:1 {0=mds01=up:active} 1 up:standby-replay 1 up:standby
    osd: 26 osds: 26 up (since 66m), 26 in (since 66m); 3 remapped pgs

  data:
    pools:   8 pools, 264 pgs
    objects: 4.96M objects, 5.0 TiB
    usage:   16 TiB used, 31 TiB / 47 TiB avail
    pgs:     115745/14865558 objects misplaced (0.779%)
             261 active+clean
             3   active+remapped+backfilling

  io:
    client:   903 KiB/s rd, 8.8 MiB/s wr, 85 op/s rd, 266 op/s wr
    recovery: 0 B/s, 61 keys/s, 12 objects/s
    cache:    4.2 MiB/s flush, 15 MiB/s evict, 0 op/s promote

_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux