On Wed, 18 May 2016, Chandan Kumar Singh wrote: > I can understand the complexities now. Still, there is a little bit of > surprise element when large number of xattrs can be stored in leveldb > but not omaps. Are these xattrs not being erasure coded and > distributed over nodes? When EC pools are space efficient alternative > to replicated pools, not having omaps defeats the purpose for anyone > who uses omaps extensively. I can guess that some users might be > storing the key-value kind of metadata in some external store. That's exactly the issue: xattrs are replicated across all OSDs in the PG (and also appear in pg log entries). We didn't implement a way to erasure code key/value data (and it's not obvious how one should do so). For now, heavy omap users should just stick to replication. sage > > On Wed, May 18, 2016 at 2:11 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > > On Tue, 17 May 2016, Chandan Kumar Singh wrote: > >> Thanks. Why are omaps not allowed for objects in EC pools? > > > > Because it doesn't make much sense to erasure code small key=value pairs > > over lots of nodes. Values are too small to be individually encoded > > sensibly, and packing them together would require a layer of complexity. > > It could presumably be done, but we didn't do it, and have yet to hear > > from someone who really needs it. > > > > sage > > > > > > > >> > >> On Tue, May 17, 2016 at 7:22 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote: > >> > On Tue, 17 May 2016, Chandan Kumar Singh wrote: > >> >> Hi > >> >> > >> >> While migrating to EC pools, I came to know that it does not support > >> >> omaps but it allows thousands of xattrs (XFS). Are these xattrs being > >> >> stored in a key-value store or in XFS file system? > >> > > >> > They are stored in XFS, until there are more than a handful, after which > >> > point they get stored in leveldb. But they are *also* stored in every pg > >> > log event that modifies the object, so you should definitely not (ab)use > >> > xattrs the way you would use omap and expect the system to behave/perform! > >> > They are meant to be small and few. > >> > > >> > sage > >> > >> > > -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html