Re: How to calculate the nearfull ratio ?

Xavier Villaneau <xvillaneau+ceph@xxxxxxxxx> · Thu, 04 May 2017 13:58:20 +0000

Hello Loïc,

On Thu, May 4, 2017 at 8:30 AM Loic Dachary <loic@xxxxxxxxxxx> wrote:

Is there a way to calculate the optimum nearfull ratio for a given crushmap ?

This is a 
question that I was planning to cover in those calculations I was working on 
for python-crush. I've currently shelved the work for a few weeks but 
intend to look at it again as time frees up.

Basically, I see this as a five-fold uncertainty problem:
1. CRUSH mappings are pseudo-random and therefore (usually) uneven
2. Object distribution between placement groups has the exact same issue
3. Object size within a given pool can also vary greatly (from bytes to megabytes)
4. Failures and the following re-balancing are also random.
5. Finally, pools can occupy different and overlapping sets of OSDs, and hold independent sets of objects.

Thanks to your new CRUSH tools, I think #1 and #4 are solved respectively by the ability to:
- generate a CRUSH map for a precise (and even) distribution of PGs;
- test mappings for every scenario of N failures and find the worst-case scenario (very expensive calculation, but possible).

Issues
 #2 and #3 are more tricky. The big picture is that a given amount of 
data is placed more evenly the more objects there are, and there should 
be a way to use statistics to quantify that. Variance in object size 
then brings in more uncertainty, but I think that metric is difficult to
 quantify outside of very specific use cases where object size are 
known.

Finally, this might all be made redundant by the new
 auto-rebalancing feature that Sage is planning for Luminous. If we can 
assume even data placement at all times the #4 is the only thing we need
 to worry about. For performance-based placement that would be very 
different however. And if pools have overlapping OSD sets, that could be fairly tricky too.

Maybe
 some other users here already have some rule of thumb or actual 
calculations for that. I was planning to get into the statistical 
calculations of data placement assuming unique object size as the next 
step for the paper I am working on. Would there be a need for such 
tools?

Regards,
-- 
Xavier Villaneau
Storage Software Eng. at Concurrent Computer Corp.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com