> I'm trying to understand the nuts and bolts of EC / CephFS > We're running an EC4+2 pool on top of 72 x 7.2K rpm 10TB drives. Pretty > slow bulk / archive storage. Ok, did some more searching and found this: http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021642.html. Which to some degree confirms my understanding, I'd still like to get even more insight though. Gregory Farnum comes with this comments: "Unfortunately any logic like this would need to be handled in your application layer. Raw RADOS does not do object sharding or aggregation on its own. CERN did contribute the libradosstriper, which will break down your multi-gigabyte objects into more typical sizes, but a generic system for packing many small objects into larger ones is tough ? the choices depend so much on likely access patterns and such. I would definitely recommend working out something like that, though! " An idea about how to advance this stuff: I can see that this would be "very hard" by the Ceph concepts to do at the objects level, but a suggestion would be to do it at the CephFS/MDS level. A basic thing that "often" would work, would be to on a "directory level" have a special type of "packed" object, where multiple files went into the same CephFS object. For common access patterns people are reading through entire catalogs in the first place, which would also limits IO on the overall system for tree traversals (Think tar cxvf linux.kernel.tar.gz git-checkout) I have no idea about how cephfs is dealing with concurrent updates around entitites, but in this situation, dealing with concurrency at the packed-object level. It would be harder to "pack files across catalogs", since that is not the native way of the MDS to keep track of things. A third way would be to more "agressively" inline data on the MDS. How mature - well tested - efficient is that feature? http://docs.ceph.com/docs/master/cephfs/experimental-features/ The unfortunate consequence of bumping the 2KB size upwards to meet the point where EC-pools become efficient would mean that we end up hitting the MDS way harder than what we do today. 2KB seem like a safe limit. _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com