Re: RGW Indexing-in-PostgreSQL?

liuchang0812 <liuchang0812@xxxxxxxxx> · Tue, 31 Jan 2017 12:03:40 +0800

BTW, it would be better to support Bucket Index Plugin mechanism. We
could use Hbase/Cassandra as optional back-end in production
environment since hbase/cassandra is already deployed in our servers.

2017-01-31 8:19 GMT+08:00 Martin Millnert <martin@xxxxxxxxxxx>:
> Hi Mark,
>
> On Mon, Jan 30, 2017 at 06:09:44PM -0600, Mark Nelson wrote:
>>
>>
>> On 01/30/2017 05:56 PM, Martin Millnert wrote:
>> > Hi,
>> >
>> > we're running RGW with indexes placed on fast pools, whereas the data is
>> > placed on slow pools. The write throughput is approximately 10% of rados
>> > bench and I guess we're experiencing some locking/syncronization
>> > feature, because parallelisation doesn't really speed it up.
>> > I've seen the sharding option, and the indexless option, but neither one
>> > of these seems like /the/ fix to me.
>> > My limited knowledge of the RGW code makes me guess it's due to the
>> > indexes, and possibly even additional metadata that RGW keeps
>> > up-to-date?
>> >
>> > Assuming that the index objectes must be downloaded, rewritten, and
>> > re-uploaded to the index pools by RGW (and that this should be
>> > locking?), the thought that I've had for a while now is:
>> > How hard is it to abstract these index operations and add support for
>> > actually using PostgreSQL as a backend?
>>
>> The indexes are definitely something to watch!  It would be very interesting
>> to see how your write benchmark does with indexless buckets.
>
> Yeah we'll try to get some comparative data to try to find the
> bottleneck(s), though what I want is fast indexes. :-)
>
>> we've seen in the past is when the bucket indexes are slow (in that
>> particular case because the bucket index pool only had 8 PGs!), RGW is
>> spending most of it's time waiting on bucket index updates.  You can grab a
>> couple of GDB stacktraces of the RGW process to see if this is happening in
>> your case.
>
> Thanks for the pointer, I'll see what we can do.
>
>> > I don't mean to say bad words vis-a-vis a completely self-contained data
>> > management system, etc, but (more) right(*) (optional) tool for the job has some
>> > quality to it too. :-)
>> >
>> > I'd be willing to take a look at it, I guess.
>> >
>> > Thoughts?
>>
>> I think first figure out if it's actually the bucket indexes.
>
> Sure, data first makes very good sense. :-)
>> Like you said, there's more metadata associated with RGW, so at least some
>> of what you are seeing could be metadata getting pushed out of the inode and
>> into separate extents.  Are you writing out lots of really small objects or fewer
>> larger objects?
>
> We're throughput limited when uploading single chunk or multi chunk
> large objects (10+ MB), this is the main concern.
> Data pool is on a wide EC (10+4), where rados bench saturate CPU on hosts,
> but give 10x throughput. Have to double check if our config is doing additional
> striping/chunking of incoming objects, and how this interacts with EC.
>
> Thanks,
> Martin
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html