Re: [ceph-users] keyvaluestore backend metadata overhead

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Chris,

[Moving this thread to ceph-devel, which is probably a bit more 
appropriate.]

On Thu, 29 Jan 2015, Chris Pacejo wrote:
> Hi, we've been experimenting with the keyvaluestore backend, and have found
> that, on every object write (e.g. with `rados put`), a single transaction is
> issued containing an additional 9 KeyValueDB writes, beyond those which
> constitute the object data.  Given the key names, these are clearly all
> metadata of some sort, but this poses a problem when the objects themselves
> are very small.  Given the default strip block size of 4 KiB, with objects
> of size 36 KiB or less, half or more of all key-value store writes are
> metadata writes.  With objects of size 4 KiB or less, the metadata overhead
> grows to 90%+.
> 
> Is there any way to reduce the number of metadata rows which must be written
> with each object?

There is a level (or two) of indirection in KeyValueStore's 
GenericObjectMap that is there to allow object cloning.  I wonder if we 
will want to facilitate a backend that doesn't implement clone and can 
only be used for pools that disallow clone and snap operations.

There is also some key consolidation in the OSD layer we talked about in 
the wednesday performance call that will cut this down some!

> (Alternatively, if there is a way to convince the OSD to issue multiple
> concurrent write transactions, that would also help.  But even with
> "keyvaluestore op threads" set as high as 64, and `rados bench` issuing 64
> concurrent writes, we never see more than a single active write transaction
> on the (multithread-capable) backend.  Is there some other option we're
> missing?)

sage

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux