Re: BlueStore fragmentation woes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 5/25/23 22:12, Igor Fedotov wrote:

On 25/05/2023 20:36, Stefan Kooman wrote:
On 5/25/23 18:17, Igor Fedotov wrote:
Perhaps...

I don't like the idea to use fragmentation score as a real index. IMO it's mostly like a very imprecise first turn marker to alert that something might be wrong. But not a real quantitative high-quality estimate.

Chiming in on the high fragmentation issue. We started collecting "fragmentation_rating" of each OSD this afternoon. All OSDs that have been provisioned a year ago have a fragmentation rating of ~ 0.9. Not sure for how long they are on this level.

Could you please collect allocation probes from existing OSD logs? Just a few samples from different OSDs...

10 OSDs from one host, but I have checked other nodes and they are similar:

CNT	FRAG	Size	Ratio	Avg Frag size
21350923	37146899	317040259072	1.73982637659271	8534.77053554322
20951932	38122769	317841477632	1.8195347808498	8337.31352599283
21188454	37298950	278389411840	1.76034315670223	7463.73321072041
21605451	39369462	270427185152	1.82220042525379	6868.95810646333
19215230	36063713	290967818240	1.87682962941375	8068.16032059705
19293599	35464928	269238423552	1.83817068033807	7591.68109835159
19963538	36088151	315796836352	1.80770317365589	8750.70702159277
18030613	31753098	297826177024	1.76106591606176	9379.43683554909
17889602	31718012	299550142464	1.77298589426417	9444.16511551859
18475332	33264944	266053271552	1.80050588536109	7998.0074985847
18618154	31914219	254801883136	1.71414518324427	7983.96110323113
16437108	29421873	275350355968	1.78996651965784	9358.69568766067
17164338	28605353	249404649472	1.66655731202683	8718.81040838755
17895480	29658102	309047177216	1.65729569701399	10420.3288941416
19546560	34588509	301368737792	1.76954456436324	8712.97279081905
18525784	34806856	314875801600	1.87883309014075	9046.37297893266
18550989	35236438	273069948928	1.89943716747393	7749.64679823767
19085807	34605572	255512043520	1.81315738967705	7383.55209155335
17203820	31205542	277097357312	1.81387284916954	8879.74826112618
18003801	33723670	269696761856	1.87314167713807	7997.25420916525
18655425	33227176	306511810560	1.78109992133655	9224.7325069094
26380965	45627920	335281111040	1.72957736762093	7348.15680925188
24923956	44721109	328790982656	1.79430219664968	7352.03106559813
25312482	43035393	287792226304	1.70016488308021	6687.33817079351
25841471	46276699	288168476672	1.79079197929561	6227.07502693742
25618384	43785917	321591488512	1.70915999229303	7344.63294469772
26006097	45056206	298747666432	1.73252472295247	6630.55532088077
26684805	45196730	351100243968	1.69372532420604	7768.26650883814
24025872	42450135	353265467392	1.76685095966548	8321.89267223768
24080466	45510525	371726323712	1.88993539410741	8167.91991988666
23195936	45095051	326473826304	1.94409274969546	7239.68193990955
23653302	43312705	307549573120	1.83114835298683	7100.67803707942
21589455	40034670	322982109184	1.85436223378497	8067.56017182107
22469039	42042723	314323701760	1.87114023879704	7476.29266924504
23647633	43486098	370003841024	1.83891969230071	8508.55464254346
23750561	37387139	320471453696	1.57415814304344	8571.70305799542
23142315	38640274	329341046784	1.66968058294946	8523.25857689312
23539469	39573256	292528910336	1.68114480407353	7392.08596674481
23810938	37968499	277270380544	1.59458224619291	7302.64266027477
19361754	33610252	286391676928	1.73590946357443	8520.96190555191
20331818	34119736	256076865536	1.67814486633709	7505.24170339419
21017537	35862221	318755282944	1.70629988661374	8888.33078531305
21660731	42648077	329217507328	1.96891217567865	7719.39863380007
20708620	42285124	344562262016	2.04190931119505	8148.54562129225
21371937	43158447	312754188288	2.01939800777066	7246.65065654471
21447150	40034134	283613331456	1.86664120873869	7084.28790931259
18906469	36598724	302526169088	1.93577785465916	8266.03050663734
20086704	36824872	280208515072	1.83329589563325	7609.21898308296
20912511	40116356	340019290112	1.91829455582833	8475.82691987278
17728197	30717152	270751887360	1.73267208165613	8814.35516417668
16778676	30875765	267493560320	1.84017886751017	8663.54437922429
17700395	31528725	239652761600	1.78124414737637	7601.09270514428
17727766	31338207	232399462400	1.76774710361136	7415.85063880649
15488369	27225173	246367821824	1.75778179096844	9049.26561252705
16332731	29287976	227973730304	1.7932075168568	7783.86769724204
17043318	31659676	274151649280	1.85760049774346	8659.33211950748
21627836	34504152	279215091712	1.59535850003671	8092.2171833697
21244729	35619286	303324131328	1.67661757417569	8515.72744405938
22132156	38534232	281272401920	1.74109707160929	7299.28656473548
22035014	34627308	246920048640	1.57146748352418	7130.78962534425
20277457	33126067	265162657792	1.63364010585746	8004.65258347754
20669142	34587911	254815776768	1.67340816566067	7367.1918714027
21648239	34364823	292156514304	1.58741886580243	8501.61557078295
21117643	34737044	292367892480	1.64492997632359	8416.60253186771
20531946	37038043	292538568704	1.8039226773731	7898.32682855301
21393711	35682241	257189515264	1.66788459468299	7207.77361668512
21738966	34753281	252140285952	1.59866301828707	7255.1505554828
19197606	32922066	269381632000	1.71490476468785	8182.40361950553
20044574	33864896	245486792704	1.68947945713389	7249.00477190304
20601681	35851902	305202065408	1.74024158514055	8512.85561943129
			1.76995040322111	8014.69622126768


So average fragment size is around 8 KiB, and the ratio between requests / fragments a bit lower than two.





And after reading your mails it might not be a problem at all. But we will start collecting this information in the coming weeks.

We will be re-provisioning all our OSDs, so that might be a good time to look at the behavior and development of "cnt versus frags" ratio.

After we completely emptied a host, even after having the OSDs run idle for a couple of hours, the fragmentation ratio would not drop lower than 0.27 for some OSDs, and up to 0.62 for others. Is it expected that this will not go to ~ zero?

You might be facing the issue fixed by https://github.com/ceph/ceph/pull/49885

Possibly.


I have read some tracker tickets that got mentioned in PRs [1,2]. The problem seems to reveal itself in Pacific release. I wonder if this has something to do with the change in default allocator: bitmap -> hybrid in Pacific.

BlueFS 4K allocation unit will not be backported to Pacific [3]. Would it make sense to skip re-provisiong OSDs in Pacific altogether and do re-provisioning in Quincy release with BlueFS 4K alloc size support [4]?

Gr. Stefan

[1]: https://tracker.ceph.com/issues/58022
[2]: https://tracker.ceph.com/issues/57672
[3]: https://tracker.ceph.com/issues/58589
[4]: https://tracker.ceph.com/issues/58588
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux