Re: Fwd: How does EC pools support thousands of xattrs (XFS) but no omaps?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Wed, 18 May 2016, Chandan Kumar Singh wrote:
> I can understand the complexities now. Still, there is a little bit of
> surprise element when large number of xattrs can be stored in leveldb
> but not  omaps. Are these xattrs not being erasure coded and
> distributed over nodes? When EC pools are space efficient alternative
> to replicated pools, not having omaps defeats the purpose for anyone
> who uses omaps extensively. I can guess that some users might be
> storing the key-value kind of metadata in some external store.

That's exactly the issue: xattrs are replicated across all OSDs in the PG 
(and also appear in pg log entries).  We didn't implement a way to 
erasure code key/value data (and it's not obvious how one should do so).

For now, heavy omap users should just stick to replication.

sage


> 
> On Wed, May 18, 2016 at 2:11 AM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> > On Tue, 17 May 2016, Chandan Kumar Singh wrote:
> >> Thanks. Why are omaps not allowed for objects in EC pools?
> >
> > Because it doesn't make much sense to erasure code small key=value pairs
> > over lots of nodes.  Values are too small to be individually encoded
> > sensibly, and packing them together would require a layer of complexity.
> > It could presumably be done, but we didn't do it, and have yet to hear
> > from someone who really needs it.
> >
> > sage
> >
> >
> >
> >>
> >> On Tue, May 17, 2016 at 7:22 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> >> > On Tue, 17 May 2016, Chandan Kumar Singh wrote:
> >> >> Hi
> >> >>
> >> >> While migrating to EC pools, I came to know that it does not support
> >> >> omaps but it allows thousands of xattrs (XFS). Are these xattrs being
> >> >> stored in a key-value store or in XFS file system?
> >> >
> >> > They are stored in XFS, until there are more than a handful, after which
> >> > point they get stored in leveldb.  But they are *also* stored in every pg
> >> > log event that modifies the object, so you should definitely not (ab)use
> >> > xattrs the way you would use omap and expect the system to behave/perform!
> >> > They are meant to be small and few.
> >> >
> >> > sage
> >>
> >>
> 
> 
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html



[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux