Re: Cannot start virtual machines KVM / LXC

Paul Emmerich <paul.emmerich@xxxxxxxx> · Fri, 20 Sep 2019 21:15:08 +0200

On Fri, Sep 20, 2019 at 1:31 PM Thomas Schneider <74cmonty@xxxxxxxxx> wrote:
>
> Hi,
>
> I cannot get rid of
>      pgs unknown
> because there were 3 disks that couldn't be started.
> Therefore I destroyed the relevant OSD and re-created it for the
> relevant disks.

and you had it configured to run with replica 3? Well, I guess the
down PGs where located on these three disks that you wiped.

Do you still have the disks? Use ceph-objectstore-tool to export the
affected PGs manually and inject them into another OSD.

Paul

> Then I added the 3 OSDs to crushmap.
>
> Regards
> Thomas
>
> Am 20.09.2019 um 08:19 schrieb Ashley Merrick:
> > Your need to fix this first.
> >
> >     pgs:     0.056% pgs unknown
> >              0.553% pgs not active
> >
> > The back filling will cause slow I/O, but having pgs unknown and not
> > active will cause I/O blocking which your seeing with the VM booting.
> >
> > Seems you have 4 OSD's down, if you get them back online you should be
> > able to get all the PG's online.
> >
> >
> > ---- On Fri, 20 Sep 2019 14:14:01 +0800 *Thomas <74cmonty@xxxxxxxxx>*
> > wrote ----
> >
> >     Hi,
> >
> >     here I describe 1 of the 2 major issues I'm currently facing in my 8
> >     node ceph cluster (2x MDS, 6x ODS).
> >
> >     The issue is that I cannot start any virtual machine KVM or container
> >     LXC; the boot process just hangs after a few seconds.
> >     All these KVMs and LXCs have in common that their virtual disks
> >     reside
> >     in the same pool: hdd
> >
> >     This pool hdd is relatively small compared to the largest pool:
> >     hdb_backup
> >     root@ld3955:~# rados df
> >     POOL_NAME              USED  OBJECTS CLONES    COPIES
> >     MISSING_ON_PRIMARY
> >     UNFOUND DEGRADED    RD_OPS       RD    WR_OPS      WR USED COMPR
> >     UNDER COMPR
> >     backup                  0 B        0      0
> >     0
> >     0       0        0         0      0 B         0     0 B        0
> >     B         0 B
> >     hdb_backup          589 TiB 51262212      0
> >     153786636
> >     0       0   124895  12266095  4.3 TiB 247132863 463 TiB        0
> >     B         0 B
> >     hdd                 3.2 TiB   281884   6568
> >     845652
> >     0       0     1658 275277357   16 TiB 208213922  10 TiB        0
> >     B         0 B
> >     pve_cephfs_data     955 GiB    91832      0
> >     275496
> >     0       0     3038      2103 1021 MiB    102170 318 GiB        0
> >     B         0 B
> >     pve_cephfs_metadata 486 MiB       62      0
> >     186
> >     0       0        7       860  1.4 GiB     12393 166 MiB        0
> >     B         0 B
> >
> >     total_objects    51635990
> >     total_used       597 TiB
> >     total_avail      522 TiB
> >     total_space      1.1 PiB
> >
> >     This is the current health status of the ceph cluster:
> >       cluster:
> >         id:     6b1b5117-6e08-4843-93d6-2da3cf8a6bae
> >         health: HEALTH_ERR
> >                 1 filesystem is degraded
> >                 1 MDSs report slow metadata IOs
> >                 1 backfillfull osd(s)
> >                 87 nearfull osd(s)
> >                 1 pool(s) backfillfull
> >                 Reduced data availability: 54 pgs inactive, 47 pgs
> >     peering,
> >     1 pg stale
> >                 Degraded data redundancy: 129598/154907946 objects
> >     degraded
> >     (0.084%), 33 pgs degraded, 33 pgs undersized
> >                 Degraded data redundancy (low space): 322 pgs
> >     backfill_toofull
> >                 1 subtrees have overcommitted pool target_size_bytes
> >                 1 subtrees have overcommitted pool target_size_ratio
> >                 1 pools have too many placement groups
> >                 21 slow requests are blocked > 32 sec
> >
> >       services:
> >         mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 14h)
> >         mgr: ld5507(active, since 16h), standbys: ld5506, ld5505
> >         mds: pve_cephfs:1/1 {0=ld3955=up:replay} 1 up:standby
> >         osd: 360 osds: 356 up, 356 in; 382 remapped pgs
> >
> >       data:
> >         pools:   5 pools, 8868 pgs
> >         objects: 51.64M objects, 197 TiB
> >         usage:   597 TiB used, 522 TiB / 1.1 PiB avail
> >         pgs:     0.056% pgs unknown
> >                  0.553% pgs not active
> >                  129598/154907946 objects degraded (0.084%)
> >                  2211119/154907946 objects misplaced (1.427%)
> >                  8458 active+clean
> >                  298  active+remapped+backfill_toofull
> >                  29   remapped+peering
> >                  24
> >     active+undersized+degraded+remapped+backfill_toofull
> >                  22   active+remapped+backfill_wait
> >                  17   peering
> >                  5    unknown
> >                  5    active+recovery_wait+undersized+degraded+remapped
> >                  3    active+undersized+degraded+remapped+backfill_wait
> >                  2    activating+remapped
> >                  1    active+clean+remapped
> >                  1    stale+peering
> >                  1    active+remapped+backfilling
> >                  1    active+recovering+undersized+remapped
> >                  1    active+recovery_wait+degraded
> >
> >       io:
> >         client:   9.2 KiB/s wr, 0 op/s rd, 1 op/s wr
> >
> >     I believe the cluster is busy with rebalancing pool hdb_backup.
> >     I set the balance mode upmap recently after the 589TB data was
> >     written.
> >     root@ld3955:~# ceph balancer status
> >     {
> >         "active": true,
> >         "plans": [],
> >         "mode": "upmap"
> >     }
> >
> >
> >     In order to resolve the issue with pool hdd I started some
> >     investigation.
> >     First step was to install drivers for the NIC provided Mellanox.
> >     Then I configured some kernel parameters recommended
> >     <https://community.mellanox.com/s/article/linux-sysctl-tuning> by
> >     Mellanox.
> >
> >     However this didn't fix the issue.
> >     In my opinion I must get rid of all "slow requests are blocked".
> >
> >     When I check the output of ceph health detail any OSD listed under
> >     REQUEST_SLOW points to an OSD that belongs to pool hdd.
> >     This means none of the disks belonging to pool hdb_backup is
> >     showing a
> >     comparable behaviour.
> >
> >     Then I checked the running processes on the different OSD nodes; I
> >     use
> >     tool "glances" here.
> >     Here I can see single processes that are running for hours and
> >     consuming
> >     much CPU, e.g.
> >     66.8   0.2   2.13G 1.17G 1192756 ceph        17h8:33 58    0 S
> >     41M 2K
> >     /usr/bin/ceph-osd -f --cluster ceph --id 37 --setuser ceph
> >     --setgroup ceph
> >     34.2   0.2   4.31G 1.20G  971267 ceph       15h38:46 58    0 S
> >     14M 3K
> >     /usr/bin/ceph-osd -f --cluster ceph --id 73 --setuser ceph
> >     --setgroup ceph
> >
> >     Similar processes are running on 4 OSD nodes.
> >     All processes have in common that the relevant OSD belongs to pool
> >     hdd.
> >
> >     Furthermore glances gives me this alert:
> >     CRITICAL on CPU_IOWAIT (Min:1.9 Mean:2.3 Max:2.6): ceph-osd,
> >     ceph-osd,
> >     ceph-osd
> >
> >     What can / should I do now?
> >     Kill the long running processes?
> >     Stop the relevant OSDs?
> >
> >     Please advise?
> >
> >     THX
> >     Thomas
> >     _______________________________________________
> >     ceph-users mailing list -- ceph-users@xxxxxxx
> >     <mailto:ceph-users@xxxxxxx>
> >     To unsubscribe send an email to ceph-users-leave@xxxxxxx
> >     <mailto:ceph-users-leave@xxxxxxx>
> >
> >
> >
> _______________________________________________
> ceph-users mailing list -- ceph-users@xxxxxxx
> To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx