-Greg
On Wed, Sep 12, 2018 at 9:24 PM Benjamin Cherian <benjamin.cherian@xxxxxxxxx> wrote:
Greg, Paul,Thank you for the feedback. This has been very enlightening. One last question (for now at least). Are there any expected performance impacts from having I/O to multiple pools from the same client? (Given how RGW and CephFS store metadata, I would hope not, but I thought I'd ask.) Based on everything that has been described it makes sense to have metadata heavy objects (i.e., objects with a large fraction of kv data) to be in a replicated pool while putting the large blobs in an EC pool.Thanks again,BenOn Wed, Sep 12, 2018 at 1:05 PM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:On Tue, Sep 11, 2018 at 5:32 PM Benjamin Cherian <benjamin.cherian@xxxxxxxxx> wrote:Ok, that’s good to know. I was planning on using an EC pool. Maybe I'll store some of the larger kv pairs in their own objects or move the metadata into it's own replicated pool entirely. If the storage mechanism is the same, is there a reason xattrs are supported and omap is not? (Or is there some hidden cost to storing kv pairs in an EC pool I’m unaware of, e.g., does the kv data get replicated across all OSDs being used for a PG or something?)Yeah, if you're on an EC pool there isn't a good way to erasure-code key-value data. So we willingly replicate xattrs across all the nodes (since they are presumed to be small and limited in number — I think we actually have total limits, but not sure?) but don't support omap at all (as it's presumed to be a lot of data).Do note that if small objects are a large proportion of your data you might prefer to put them in a replicated pool — in an EC pool you'd need very small chunk sizes to get any non-replication happening anyway, and for something in the 10KB range at a reasonable k+m you'd be dominated by metadata size anyway.-GregThanks,BenOn Tue, Sep 11, 2018 at 1:46 PM Patrick Donnelly <pdonnell@xxxxxxxxxx> wrote:On Tue, Sep 11, 2018 at 12:43 PM, Benjamin Cherian
<benjamin.cherian@xxxxxxxxx> wrote:
> On Tue, Sep 11, 2018 at 10:44 AM Gregory Farnum <gfarnum@xxxxxxxxxx> wrote:
>>
>> <snip>
>> In general, if the key-value storage is of unpredictable or non-trivial
>> size, you should use omap.
>>
>> At the bottom layer where the data is actually stored, they're likely to
>> be in the same places (if using BlueStore, they are the same — in FileStore,
>> a rados xattr *might* be in the local FS xattrs, or it might not). It is
>> somewhat more likely that something stored in an xattr will get pulled into
>> memory at the same time as the object's internal metadata, but that only
>> happens if it's quite small (think the xfs or ext4 xattr rules).
>
>
> Based on this description, if I'm planning on using Bluestore, there is no
> particular reason to ever prefer using xattrs over omap (outside of ease of
> use in the API), correct?
You may prefer xattrs on bluestore if the metadata is small and you
may need to store the xattrs on an EC pool. omap is not supported on
ecpools.
--
Patrick Donnelly
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com