Re: newstore direction

John Spray <jspray@xxxxxxxxxx> · Tue, 20 Oct 2015 01:48:05 +0100

On Mon, Oct 19, 2015 at 8:49 PM, Sage Weil <sweil@xxxxxxxxxx> wrote:
>  - We have to size the kv backend storage (probably still an XFS
> partition) vs the block storage.  Maybe we do this anyway (put metadata on
> SSD!) so it won't matter.  But what happens when we are storing gobs of
> rgw index data or cephfs metadata?  Suddenly we are pulling storage out of
> a different pool and those aren't currently fungible.

This is the concerning bit for me -- the other parts one "just" has to
get the code right, but this problem could linger and be something we
have to keep explaining to users indefinitely.  It reminds me of cases
in other systems where users had to make an educated guess about inode
size up front, depending on whether you're expecting to efficiently
store a lot of xattrs.

In practice it's rare for users to make these kinds of decisions well
up-front: it really needs to be adjustable later, ideally
automatically.  That could be pretty straightforward if the KV part
was stored directly on block storage, instead of having XFS in the
mix.  I'm not quite up with the state of the art in this area: are
there any reasonable alternatives for the KV part that would consume
some defined range of a block device from userspace, instead of
sitting on top of a filesystem?

John
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html