> [...] (>10k OSDs, >60 PB of data). 6TBs on average per OSD? Hopully SSDs or RAID10 (or low-number, 3-5) RAID5. > It is entirely dedicated to object storage with S3 interface. > Maintenance and its extension are getting more and more > problematic and time consuming. Ah the joys of a single large unified storage pool :-). https://www.sabi.co.uk/blog/0804apr.html?080417#080417 > We consider to split it to two or more completely separate > clusters I would suggest doing it 1-2 years ago... > create S3 layer of abstraction with some additional metadata > that will allow us to use these 2+ physically independent > instances as a one logical cluster. That's what the bucket hierarchy in a Ceph cluster instance already does. What your layer is going to do is either: 1) Lookup the object ID in a list of instances, and fetch the object from the instance that validates the object ID; 2) Maintain a huge table of all object IDs and which instances they are in. But 1) is basically what CRUSH already does and 2) means giving up the Ceph "decentralized" philosophy based on CRUSH. BTW one old practice that so few systems follow is to use as object keys neither addresses nor identifiers, but *both*: first access the address treating it as a hint, check that the identifier matches, if not do a slower lookup using the object identifier part to find the actual address. > Additionally, newest data is the most demanded data, so we > have to spread it equally among clusters to avoid skews in > cluster load. I usually do the opposite, but that depends on your application. My practice is to recognize that data is indeed usually stratified by date, and regard filesystem instances as "silos" and create a new filesystems instance every some months or years, and direct all new file creation to the latest instance, and then get rid progressively of the older instances or copy their "active" data onwards into the new instance, and the "inactive" data to offline storage. http://www.sabi.co.uk/blog/12-fou.html?121218b#121218b If you really need to keep all data forever online, which is usually not the case (that's why there are laws that expire matters after N years) the second best option is to keep old silos powered up indefinitely, and they will take very little attention beyond refreshing the hardware periodically and migrating the data to new instances when that stops being economical. _______________________________________________ ceph-users mailing list -- ceph-users@xxxxxxx To unsubscribe send an email to ceph-users-leave@xxxxxxx