On Fri, Sep 20, 2019 at 1:31 PM Thomas Schneider <74cmonty@xxxxxxxxx> wrote: > > Hi, > > I cannot get rid of > pgs unknown > because there were 3 disks that couldn't be started. > Therefore I destroyed the relevant OSD and re-created it for the > relevant disks. and you had it configured to run with replica 3? Well, I guess the down PGs where located on these three disks that you wiped. Do you still have the disks? Use ceph-objectstore-tool to export the affected PGs manually and inject them into another OSD. Paul > Then I added the 3 OSDs to crushmap. > > Regards > Thomas > > Am 20.09.2019 um 08:19 schrieb Ashley Merrick: > > Your need to fix this first. > > > > pgs: 0.056% pgs unknown > > 0.553% pgs not active > > > > The back filling will cause slow I/O, but having pgs unknown and not > > active will cause I/O blocking which your seeing with the VM booting. > > > > Seems you have 4 OSD's down, if you get them back online you should be > > able to get all the PG's online. > > > > > > ---- On Fri, 20 Sep 2019 14:14:01 +0800 *Thomas <74cmonty@xxxxxxxxx>* > > wrote ---- > > > > Hi, > > > > here I describe 1 of the 2 major issues I'm currently facing in my 8 > > node ceph cluster (2x MDS, 6x ODS). > > > > The issue is that I cannot start any virtual machine KVM or container > > LXC; the boot process just hangs after a few seconds. > > All these KVMs and LXCs have in common that their virtual disks > > reside > > in the same pool: hdd > > > > This pool hdd is relatively small compared to the largest pool: > > hdb_backup > > root@ld3955:~# rados df > > POOL_NAME USED OBJECTS CLONES COPIES > > MISSING_ON_PRIMARY > > UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR > > UNDER COMPR > > backup 0 B 0 0 > > 0 > > 0 0 0 0 0 B 0 0 B 0 > > B 0 B > > hdb_backup 589 TiB 51262212 0 > > 153786636 > > 0 0 124895 12266095 4.3 TiB 247132863 463 TiB 0 > > B 0 B > > hdd 3.2 TiB 281884 6568 > > 845652 > > 0 0 1658 275277357 16 TiB 208213922 10 TiB 0 > > B 0 B > > pve_cephfs_data 955 GiB 91832 0 > > 275496 > > 0 0 3038 2103 1021 MiB 102170 318 GiB 0 > > B 0 B > > pve_cephfs_metadata 486 MiB 62 0 > > 186 > > 0 0 7 860 1.4 GiB 12393 166 MiB 0 > > B 0 B > > > > total_objects 51635990 > > total_used 597 TiB > > total_avail 522 TiB > > total_space 1.1 PiB > > > > This is the current health status of the ceph cluster: > > cluster: > > id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae > > health: HEALTH_ERR > > 1 filesystem is degraded > > 1 MDSs report slow metadata IOs > > 1 backfillfull osd(s) > > 87 nearfull osd(s) > > 1 pool(s) backfillfull > > Reduced data availability: 54 pgs inactive, 47 pgs > > peering, > > 1 pg stale > > Degraded data redundancy: 129598/154907946 objects > > degraded > > (0.084%), 33 pgs degraded, 33 pgs undersized > > Degraded data redundancy (low space): 322 pgs > > backfill_toofull > > 1 subtrees have overcommitted pool target_size_bytes > > 1 subtrees have overcommitted pool target_size_ratio > > 1 pools have too many placement groups > > 21 slow requests are blocked > 32 sec > > > > services: > > mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 14h) > > mgr: ld5507(active, since 16h), standbys: ld5506, ld5505 > > mds: pve_cephfs:1/1 {0=ld3955=up:replay} 1 up:standby > > osd: 360 osds: 356 up, 356 in; 382 remapped pgs > > > > data: > > pools: 5 pools, 8868 pgs > > objects: 51.64M objects, 197 TiB > > usage: 597 TiB used, 522 TiB / 1.1 PiB avail > > pgs: 0.056% pgs unknown > > 0.553% pgs not active > > 129598/154907946 objects degraded (0.084%) > > 2211119/154907946 objects misplaced (1.427%) > > 8458 active+clean > > 298 active+remapped+backfill_toofull > > 29 remapped+peering > > 24 > > active+undersized+degraded+remapped+backfill_toofull > > 22 active+remapped+backfill_wait > > 17 peering > > 5 unknown > > 5 active+recovery_wait+undersized+degraded+remapped > > 3 active+undersized+degraded+remapped+backfill_wait > > 2 activating+remapped > > 1 active+clean+remapped > > 1 stale+peering > > 1 active+remapped+backfilling > > 1 active+recovering+undersized+remapped > > 1 active+recovery_wait+degraded > > > > io: > > client: 9.2 KiB/s wr, 0 op/s rd, 1 op/s wr > > > > I believe the cluster is busy with rebalancing pool hdb_backup. > > I set the balance mode upmap recently after the 589TB data was > > written. > > root@ld3955:~# ceph balancer status > > { > > "active": true, > > "plans": [], > > "mode": "upmap" > > } > > > > > > In order to resolve the issue with pool hdd I started some > > investigation. > > First step was to install drivers for the NIC provided Mellanox. > > Then I configured some kernel parameters recommended > > <https://community.mellanox.com/s/article/linux-sysctl-tuning> by > > Mellanox. > > > > However this didn't fix the issue. > > In my opinion I must get rid of all "slow requests are blocked". > > > > When I check the output of ceph health detail any OSD listed under > > REQUEST_SLOW points to an OSD that belongs to pool hdd. > > This means none of the disks belonging to pool hdb_backup is > > showing a > > comparable behaviour. > > > > Then I checked the running processes on the different OSD nodes; I > > use > > tool "glances" here. > > Here I can see single processes that are running for hours and > > consuming > > much CPU, e.g. > > 66.8 0.2 2.13G 1.17G 1192756 ceph 17h8:33 58 0 S > > 41M 2K > > /usr/bin/ceph-osd -f --cluster ceph --id 37 --setuser ceph > > --setgroup ceph > > 34.2 0.2 4.31G 1.20G 971267 ceph 15h38:46 58 0 S > > 14M 3K > > /usr/bin/ceph-osd -f --cluster ceph --id 73 --setuser ceph > > --setgroup ceph > > > > Similar processes are running on 4 OSD nodes. > > All processes have in common that the relevant OSD belongs to pool > > hdd. > > > > Furthermore glances gives me this alert: > > CRITICAL on CPU_IOWAIT (Min:1.9 Mean:2.3 Max:2.6): ceph-osd, > > ceph-osd, > > ceph-osd > > > > What can / should I do now? > > Kill the long running processes? > > Stop the relevant OSDs? > > > > Please advise? > > > > THX > > Thomas > > _______________________________________________ > > ceph-users mailing list -- ceph-users@xxxxxxx > > <mailto:ceph-users@xxxxxxx> > > To unsubscribe send an email to ceph-users-leave@xxxxxxx > > <mailto:ceph-users-leave@xxxxxxx> > > > > > > > _______________________________________________ > ceph-users mailing list -- ceph-users@xxxxxxx > To unsubscribe send an email to ceph-users-leave@xxxxxxx _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx