> Yes, that also makes perfect sense, so the aforementioned 12500 objects > for a 50GB image, at a 60 TB cluster/pool with 72 disk/OSDs and 3 way > replication that makes 2400 PGs, following the recommended formula. > >> > What amount of disks (OSDs) did you punch in for the following run? >> >> Disk Modeling Parameters >> >> size: 3TiB >> >> FIT rate: 826 (MTBF = 138.1 years) >> >> NRE rate: 1.0E-16 >> >> RADOS parameters >> >> auto mark-out: 10 minutes >> >> recovery rate: 50MiB/s (40 seconds/drive) >> > Blink??? >> > I guess that goes back to the number of disks, but to restore 2.25GB at >> > 50MB/s with 40 seconds per drive... >> >> The surviving replicas for placement groups that the failed OSDs >> participated will naturally be distributed across many OSDs in the >> cluster, when the failed OSD is marked out, it's replicas will be >> remapped to many OSDs. It's not a 1:1 replacement like you might find >> in a RAID array. >> > I completely get that part, however the total amount of data to be > rebalanced after a single disk/OSD failure to fully restore redundancy is > still 2.25TB (mistyped that as GB earlier) at the 75% utilization you > assumed. > What I'm still missing in this pictures is how many disks (OSDs) you > calculated this with. Maybe I'm just misreading the 40 seconds per drive > bit there. Because if that means each drive is only required to be just > active for 40 seconds to do it's bit of recovery, we're talking 1100 > drives. ^o^ 1100 PGs would be another story. To recreate the modeling: git clone https://github.com/ceph/ceph-tools.git cd ceph-tools/models/reliability/ python main.py -g I used the following values: Disk Type: Enterprise Size: 3000 GiB Primary FITs: 826 Secondary FITS: 826 NRE Rate: 1.0E-16 RAID Type: RAID6 Replace (hours): 6 Rebuild (MiB/s): 500 Volumes: 11 RADOS Copies: 3 Mark-out (min): 10 Recovery (MiB/s): 50 Space Usage: 75% Declustering (pg): 1100 Stripe length: 1100 (limited by pgs anyway) RADOS sites: 1 Rep Latency (s): 0 Recovery (MiB/s): 10 Disaster (years): 1000 Site Recovery (days): 30 NRE Model: Fail Period (years): 1 Object Size: 4MB It seems that the number of disks is not considered when calculating the recovery window, only the number of pgs https://github.com/ceph/ceph-tools/blob/master/models/reliability/RadosRely.py#L68 I could also see the recovery rates varying based on the max osd backfill tunable. http://ceph.com/docs/master/rados/configuration/osd-config-ref/#backfilling Doing both would improve the quality of models generated by the tool. -- Kyle _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com