On Wed, Nov 6, 2013 at 6:42 PM, Darren Birkett <darren.birkett@xxxxxxxxx> wrote: > > On 6 November 2013 14:08, Andrey Korolyov <andrey@xxxxxxx> wrote: >> >> > We are looking at building high density nodes for small scale 'starter' >> > deployments for our customers (maybe 4 or 5 nodes). High density in >> > this >> > case could mean a 2u chassis with 2x external 45 disk JBOD containers >> > attached. That's 90 3TB disks/OSDs to be managed by a single node. >> > That's >> > about 243TB of potential usable space, and so (assuming up to 75% >> > fillage) >> > maybe 182TB of potential data 'loss' in the event of a node failure. On >> > an >> > uncongested, unused, 10Gbps network, my back-of-a-beer-mat calculations >> > say >> > that would take about 45 hours to get the cluster back into an >> > undegraded >> > state - that is the requisite number of copies of all objects. >> > >> >> For such large number of disks you should consider that the cache >> amortization will not take any place even if you are using 1GB >> controller(s) - only tiered cache can be an option. Also recovery will >> take much more time even if you have a room for client I/O in the >> calculations because raw disks have very limited IOPS capacity and >> recovery will either take a much longer than such expectations at a >> glance or affect regular operations. For S3/Swift it may be acceptable >> but for VM images it does not. > > > Sure, but my argument was that you are never likely to actually let that > entire recovery operation complete - you're going to replace the hardware > and plug the disks back in and let them catch up by log replay/backfill. > Assuming you don't ever actually expect to really lose all data on 90 disks > in one go... > By tiered caching, do you mean using something like flashcache or bcache? Exactly, just another step to offload CPU from I/O time. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com