于 2014年03月07日 13:03, Sage Weil 写道: > On Fri, 7 Mar 2014, Li Wang wrote: >> Sorry, it is (n/3)*(n/3)*(n/3)/Cn3 = n^3/(27*Cn3) > Cn3 is "n choose 3"? > >>>>> Last night it occurred to me that this is almost just having >>>>> pgp_num < pg_num, but I think that's not quite right either. > Actually, maybe it is. Basically, say there are X combinations of 3 disks > = n choose 3. Some fraction of these, say Y, are actually used by CRUSH. > If we are to reduce that number, that implies that there are some PGs that > are overlapping on the same set of disks. Which more or less reduces to > the case where pgp_num < pg_num, or the hashpspool flag isn't set, or any > other behavior that makes more than one PG line up on the same disk. > Just using fewer PGs in the system, in fact, would help here. The main Dose it mean that we can calculate the pgp_num according to the reliability request, osd_num and replica_num, instead of using a given fixed one, namely, 100 pgs/osd ? In fact , when the osd_num of a failure domain is small , 100pgs can easily cover all of the osds, which means data lost will occur, when the down osds are in different failure domains. > problem is that doing this tends to make the distribution less uniform, so > there is a tradeoff. > > There is a reliability model in ceph-tools.git at > > https://github.com/ceph/ceph-tools/tree/master/models/reliability > > that Mark Kampe built last year. Sadly I haven't looked at it closely so > I'm not sure if it captures this. > > sage > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best regards, slhhust -- Best regards, Lianghao Shen -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html