Hi all!
Previous week we ran into terrible situation after added 4 new nodes
into one of our clusters.
Trying to reduce pg moves we set noin flag.
Then deployed 4 new node so added 30% of OSDs with reweight=0.
After that a huge number of PGs stalled in peering or activating state -
about 20%.
Please, see ceph -s output below.
Number of peering and activating PGs was decreasing very slowly - like
5-10 per minute.
Unfortunately we did not collected any useful logs.
And there were no error in OSD logs with default loglevel.
Also we noticed strange CPU utilization of affected OSDs.
Not all OSD were affected - about 1/3 of OSDs on every host.
Affected OSDs were utilizing 3.5-4 CPU cores.
And every of 3 messenger threads were utilizing whole 100% CPU.
We were able to fix that state only by restarting all OSDs in hdd pool.
After restart OSDs finished peering as expected - in several seconds.
Everything looks like PG overdose, but:
- We have about 150 PG/OSD, while mon_max_pg_per_osd=400 and
osd_max_pg_per_osd_hard_ratio=4.0
- We write "withhold creation of pg" with loglevel 0 and there were no
such messages in OSD logs
- Unexplained msgr CPU usage
A couple days after we started to add new bunch of nodes.
In this case - one-by-one, and also with noin flag.
And after first node added to cluster - we got the same 20% stucked
inactive PGs.
Again no unusual messages in logs.
We were already aware how to fix that - so restarted OSDs.
After peering was completed and backfilling stabilized we continued
adding new OSD nodes to cluster.
And while backfilling was in progress issue with inactive PGs was not
reproduced after adding next 3 nodes.
We have a number of small patches and do not want to file a bug before
become sure that the root of this issue is not one of out patches.
So if anybody already know about this kind of issues - please let me know.
What log would you suggest to enable to see details about PG lifecycle
in OSD?
Ceph version 12.2.8.
State we had 2-nd time, after added single OSD node:
cluster:
id: ad99506a-05a5-11e8-975e-74d4351a7990
health: HEALTH_ERR
noin flag(s) set
38625525/570124661 objects misplaced (6.775%)
Reduced data availability: 4637 pgs inactive, 2483 pgs peering
Degraded data redundancy: 2875/570124661 objects degraded
(0.001%), 2 pgs degraded, 1 pg undersized
26 slow requests are blocked > 5 sec. Implicated osds 312,792
4199 stuck requests are blocked > 30 sec. Implicated osds
3,4,7,10,12,13,14,21,27,28,29,31,33,35,39,47,48,51,54,55,57,58,59,63,64,67,69,70,71,72,73,74,75,83,85,86,87,92,94,96,100,102,104,107,113,117,118,119,121,125,126,129,130,131,133,136,138,140,141,145,146,148,153,154,155,156,158,160,162,163,164,165,166,168,176,179,182,183,185,187,188,189,192,194,198,199,200,201,203,205,207,208,209,210,213,215,216,220,221,223,224,226,228,230,232,234,235,238,239,240,242,244,245,246,250,252,253,255,256,257,259,261,263,264,267,271,272,273,275,279,282,284,286,288,289,291,292,293,299,300,307,311,318,319,321,323,324,327,329,330,332,333,334,339,341,342,343,345,346,348,352,354,355,356,360,361,363,365,366,367,369,370,372,378,382,384,393,396,398,401,402,404,405,409,411,412,415,416,418,421,428,429,432,434,435,436,438,441,444,446,447,448,449,451,452,453,456,457,458,460,461,462,464,465,466,467,468,469,471,472,474,478,479,480,481,482,483,485,486,487,489,492,494,498,499,503,504,505,506,507,508,509,510,512,513,515,516,517,520,521,522,523,524,527,528,530,531,533,535,536,5
38,539,541,542,546,549,550,554,555,559,561,562,563,564,565,566,568,571,573,574,578,581,582,583,588,589,590,592,593,594,595,596,597,598,599,602,604,605,606,607,608,609,610,611,612,613,614,617,618,619,620,621,622,624,627,628,630,632,633,634,636,637,638,639,640,642,643,644,645,646,647,648,650,651,652,656,659,660,661,662,663,666,668,669,671,672,673,674,675,676,678,681,682,683,686,687,691,692,694,695,696,697,699,701,704,705,706,707,708,709,712,714,716,717,718,719,720,722,724,727,729,732,733,736,737,738,739,740,741,742,743,745,746,750,751,752,754,755,756,758,759,760,761,762,763,765,766,767,768,769,770,771,772,773,774,775,776,777,778,779,780,781,782,783,784,785,786,787,788,789,790,791,793,794,795,796
services:
mon: 3 daemons, quorum
BC-SR1-4R9-CEPH-MON1,BC-SR1-4R3-CEPH-MON1,BC-SR1-4R6-CEPH-MON1
mgr: BC-SR1-4R9-CEPH-MON1(active), standbys: BC-SR1-4R3-CEPH-MON1,
BC-SR1-4R6-CEPH-MON1
osd: 828 osds: 828 up, 798 in; 5355 remapped pgs
flags noin
rgw: 187 daemons active
data:
pools: 14 pools, 21888 pgs
objects: 53.44M objects, 741TiB
usage: 1.04PiB used, 5.55PiB / 6.59PiB avail
pgs: 21.203% pgs not active
2875/570124661 objects degraded (0.001%)
38625525/570124661 objects misplaced (6.775%)
15382 active+clean
1847 remapped+peering
1642 activating+remapped
1244 active+remapped+backfill_wait
640 peering
620 active+remapped+backfilling
511 activating
1 active+undersized+degraded+remapped+backfilling
1 activating+degraded
io:
client: 715MiB/s rd, 817MiB/s wr, 5.14kop/s rd, 5.74kop/s wr
recovery: 10.1GiB/s, 688objects/s
--
Best regards,
Aleksei Gutikov
Software Engineer | synesis.ru | Minsk. BY
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com