On 05/04/2017 03:58 PM, Xavier Villaneau wrote: > Hello Loïc, > > On Thu, May 4, 2017 at 8:30 AM Loic Dachary <loic@xxxxxxxxxxx <mailto:loic@xxxxxxxxxxx>> wrote: > > Is there a way to calculate the optimum nearfull ratio for a given crushmap ? > > > This is a question that I was planning to cover in those calculations I was working on for python-crush. I've currently shelved the work for a few weeks but intend to look at it again as time frees up. Of course ! Now I see how the two are related. Thanks. > Basically, I see this as a five-fold uncertainty problem: > 1. CRUSH mappings are pseudo-random and therefore (usually) uneven > 2. Object distribution between placement groups has the exact same issue > 3. Object size within a given pool can also vary greatly (from bytes to megabytes) > 4. Failures and the following re-balancing are also random. > 5. Finally, pools can occupy different and overlapping sets of OSDs, and hold independent sets of objects. > > Thanks to your new CRUSH tools, I think #1 and #4 are solved respectively by the ability to: > - generate a CRUSH map for a precise (and even) distribution of PGs; > - test mappings for every scenario of N failures and find the worst-case scenario (very expensive calculation, but possible). > > Issues #2 and #3 are more tricky. The big picture is that a given amount of data is placed more evenly the more objects there are, and there should be a way to use statistics to quantify that. Variance in object size then brings in more uncertainty, but I think that metric is difficult to quantify outside of very specific use cases where object size are known. > > Finally, this might all be made redundant by the new auto-rebalancing feature that Sage is planning for Luminous. If we can assume even data placement at all times the #4 is the only thing we need to worry about. For performance-based placement that would be very different however. And if pools have overlapping OSD sets, that could be fairly tricky too. > > Maybe some other users here already have some rule of thumb or actual calculations for that. I was planning to get into the statistical calculations of data placement assuming unique object size as the next step for the paper I am working on. Would there be a need for such tools? > > Regards, > -- > Xavier Villaneau > Storage Software Eng. at Concurrent Computer Corp. > -- Loïc Dachary, Artisan Logiciel Libre _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com