Hello,
I have re-created the OSDs using these disks.
Can I still export the affected PGs manually?
Regards
Thomas
Am 20.09.19 um 21:15 schrieb Paul Emmerich:
On Fri, Sep 20, 2019 at 1:31 PM Thomas Schneider <74cmonty@xxxxxxxxx> wrote:
Hi,
I cannot get rid of
pgs unknown
because there were 3 disks that couldn't be started.
Therefore I destroyed the relevant OSD and re-created it for the
relevant disks.
and you had it configured to run with replica 3? Well, I guess the
down PGs where located on these three disks that you wiped.
Do you still have the disks? Use ceph-objectstore-tool to export the
affected PGs manually and inject them into another OSD.
Paul
Then I added the 3 OSDs to crushmap.
Regards
Thomas
Am 20.09.2019 um 08:19 schrieb Ashley Merrick:
Your need to fix this first.
pgs: 0.056% pgs unknown
0.553% pgs not active
The back filling will cause slow I/O, but having pgs unknown and not
active will cause I/O blocking which your seeing with the VM booting.
Seems you have 4 OSD's down, if you get them back online you should be
able to get all the PG's online.
---- On Fri, 20 Sep 2019 14:14:01 +0800 *Thomas <74cmonty@xxxxxxxxx>*
wrote ----
Hi,
here I describe 1 of the 2 major issues I'm currently facing in my 8
node ceph cluster (2x MDS, 6x ODS).
The issue is that I cannot start any virtual machine KVM or container
LXC; the boot process just hangs after a few seconds.
All these KVMs and LXCs have in common that their virtual disks
reside
in the same pool: hdd
This pool hdd is relatively small compared to the largest pool:
hdb_backup
root@ld3955:~# rados df
POOL_NAME USED OBJECTS CLONES COPIES
MISSING_ON_PRIMARY
UNFOUND DEGRADED RD_OPS RD WR_OPS WR USED COMPR
UNDER COMPR
backup 0 B 0 0
0
0 0 0 0 0 B 0 0 B 0
B 0 B
hdb_backup 589 TiB 51262212 0
153786636
0 0 124895 12266095 4.3 TiB 247132863 463 TiB 0
B 0 B
hdd 3.2 TiB 281884 6568
845652
0 0 1658 275277357 16 TiB 208213922 10 TiB 0
B 0 B
pve_cephfs_data 955 GiB 91832 0
275496
0 0 3038 2103 1021 MiB 102170 318 GiB 0
B 0 B
pve_cephfs_metadata 486 MiB 62 0
186
0 0 7 860 1.4 GiB 12393 166 MiB 0
B 0 B
total_objects 51635990
total_used 597 TiB
total_avail 522 TiB
total_space 1.1 PiB
This is the current health status of the ceph cluster:
cluster:
id: 6b1b5117-6e08-4843-93d6-2da3cf8a6bae
health: HEALTH_ERR
1 filesystem is degraded
1 MDSs report slow metadata IOs
1 backfillfull osd(s)
87 nearfull osd(s)
1 pool(s) backfillfull
Reduced data availability: 54 pgs inactive, 47 pgs
peering,
1 pg stale
Degraded data redundancy: 129598/154907946 objects
degraded
(0.084%), 33 pgs degraded, 33 pgs undersized
Degraded data redundancy (low space): 322 pgs
backfill_toofull
1 subtrees have overcommitted pool target_size_bytes
1 subtrees have overcommitted pool target_size_ratio
1 pools have too many placement groups
21 slow requests are blocked > 32 sec
services:
mon: 3 daemons, quorum ld5505,ld5506,ld5507 (age 14h)
mgr: ld5507(active, since 16h), standbys: ld5506, ld5505
mds: pve_cephfs:1/1 {0=ld3955=up:replay} 1 up:standby
osd: 360 osds: 356 up, 356 in; 382 remapped pgs
data:
pools: 5 pools, 8868 pgs
objects: 51.64M objects, 197 TiB
usage: 597 TiB used, 522 TiB / 1.1 PiB avail
pgs: 0.056% pgs unknown
0.553% pgs not active
129598/154907946 objects degraded (0.084%)
2211119/154907946 objects misplaced (1.427%)
8458 active+clean
298 active+remapped+backfill_toofull
29 remapped+peering
24
active+undersized+degraded+remapped+backfill_toofull
22 active+remapped+backfill_wait
17 peering
5 unknown
5 active+recovery_wait+undersized+degraded+remapped
3 active+undersized+degraded+remapped+backfill_wait
2 activating+remapped
1 active+clean+remapped
1 stale+peering
1 active+remapped+backfilling
1 active+recovering+undersized+remapped
1 active+recovery_wait+degraded
io:
client: 9.2 KiB/s wr, 0 op/s rd, 1 op/s wr
I believe the cluster is busy with rebalancing pool hdb_backup.
I set the balance mode upmap recently after the 589TB data was
written.
root@ld3955:~# ceph balancer status
{
"active": true,
"plans": [],
"mode": "upmap"
}
In order to resolve the issue with pool hdd I started some
investigation.
First step was to install drivers for the NIC provided Mellanox.
Then I configured some kernel parameters recommended
<https://community.mellanox.com/s/article/linux-sysctl-tuning> by
Mellanox.
However this didn't fix the issue.
In my opinion I must get rid of all "slow requests are blocked".
When I check the output of ceph health detail any OSD listed under
REQUEST_SLOW points to an OSD that belongs to pool hdd.
This means none of the disks belonging to pool hdb_backup is
showing a
comparable behaviour.
Then I checked the running processes on the different OSD nodes; I
use
tool "glances" here.
Here I can see single processes that are running for hours and
consuming
much CPU, e.g.
66.8 0.2 2.13G 1.17G 1192756 ceph 17h8:33 58 0 S
41M 2K
/usr/bin/ceph-osd -f --cluster ceph --id 37 --setuser ceph
--setgroup ceph
34.2 0.2 4.31G 1.20G 971267 ceph 15h38:46 58 0 S
14M 3K
/usr/bin/ceph-osd -f --cluster ceph --id 73 --setuser ceph
--setgroup ceph
Similar processes are running on 4 OSD nodes.
All processes have in common that the relevant OSD belongs to pool
hdd.
Furthermore glances gives me this alert:
CRITICAL on CPU_IOWAIT (Min:1.9 Mean:2.3 Max:2.6): ceph-osd,
ceph-osd,
ceph-osd
What can / should I do now?
Kill the long running processes?
Stop the relevant OSDs?
Please advise?
THX
Thomas
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
<mailto:ceph-users@xxxxxxx>
To unsubscribe send an email to ceph-users-leave@xxxxxxx
<mailto:ceph-users-leave@xxxxxxx>
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx