Re: Understanding EC properties for CephFS / small files.

jesper@xxxxxxxx · Sun, 17 Feb 2019 08:46:28 +0100

> I'm trying to understand the nuts and bolts of EC / CephFS
> We're running an EC4+2 pool on top of 72 x 7.2K rpm 10TB drives. Pretty
> slow bulk / archive storage.

Ok, did some more searching and found this:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2017-October/021642.html.

Which to some degree confirms my understanding, I'd still like to get
even more insight though.

Gregory Farnum comes with this comments:
"Unfortunately any logic like this would need to be handled in your
application layer. Raw RADOS does not do object sharding or aggregation on
its own.
CERN did contribute the libradosstriper, which will break down your
multi-gigabyte objects into more typical sizes, but a generic system for
packing many small objects into larger ones is tough ? the choices depend
so much on likely access patterns and such.

I would definitely recommend working out something like that, though!
"
An idea about how to advance this stuff:

I can see that this would be "very hard" by the Ceph concepts to do
at the objects level, but a suggestion would be to do it at the
CephFS/MDS level.

A basic thing that "often" would work, would be to on a "directory level"
have a special type of "packed" object, where multiple files went into
the same CephFS object. For common access patterns people are reading
through entire catalogs in the first place, which would also limits IO
on the overall system for tree traversals (Think tar cxvf
linux.kernel.tar.gz git-checkout)
I have no idea about how cephfs is dealing with concurrent updates
around entitites, but in this situation, dealing with concurrency
at the packed-object level.

It would be harder to "pack files across catalogs", since that is
not the native way of the MDS to keep track of things.

A third way would be to more "agressively" inline data on the MDS.
How mature - well tested - efficient is that feature?

http://docs.ceph.com/docs/master/cephfs/experimental-features/

The unfortunate consequence of bumping the 2KB size upwards to meet
the point where EC-pools become efficient would mean that we end
up hitting the MDS way harder than what we do today. 2KB seem
like a safe limit.

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com