Re: RGW Indexing-in-PostgreSQL?

Martin Millnert <martin@xxxxxxxxxxx> · Tue, 31 Jan 2017 01:19:23 +0100

Hi Mark,

On Mon, Jan 30, 2017 at 06:09:44PM -0600, Mark Nelson wrote:
> 
> 
> On 01/30/2017 05:56 PM, Martin Millnert wrote:
> > Hi,
> > 
> > we're running RGW with indexes placed on fast pools, whereas the data is
> > placed on slow pools. The write throughput is approximately 10% of rados
> > bench and I guess we're experiencing some locking/syncronization
> > feature, because parallelisation doesn't really speed it up.
> > I've seen the sharding option, and the indexless option, but neither one
> > of these seems like /the/ fix to me.
> > My limited knowledge of the RGW code makes me guess it's due to the
> > indexes, and possibly even additional metadata that RGW keeps
> > up-to-date?
> > 
> > Assuming that the index objectes must be downloaded, rewritten, and
> > re-uploaded to the index pools by RGW (and that this should be
> > locking?), the thought that I've had for a while now is:
> > How hard is it to abstract these index operations and add support for
> > actually using PostgreSQL as a backend?
> 
> The indexes are definitely something to watch!  It would be very interesting
> to see how your write benchmark does with indexless buckets.

Yeah we'll try to get some comparative data to try to find the
bottleneck(s), though what I want is fast indexes. :-)

> we've seen in the past is when the bucket indexes are slow (in that
> particular case because the bucket index pool only had 8 PGs!), RGW is
> spending most of it's time waiting on bucket index updates.  You can grab a
> couple of GDB stacktraces of the RGW process to see if this is happening in
> your case.

Thanks for the pointer, I'll see what we can do.

> > I don't mean to say bad words vis-a-vis a completely self-contained data
> > management system, etc, but (more) right(*) (optional) tool for the job has some
> > quality to it too. :-)
> > 
> > I'd be willing to take a look at it, I guess.
> > 
> > Thoughts?
> 
> I think first figure out if it's actually the bucket indexes.

Sure, data first makes very good sense. :-)
> Like you said, there's more metadata associated with RGW, so at least some
> of what you are seeing could be metadata getting pushed out of the inode and
> into separate extents.  Are you writing out lots of really small objects or fewer
> larger objects?

We're throughput limited when uploading single chunk or multi chunk
large objects (10+ MB), this is the main concern.
Data pool is on a wide EC (10+4), where rados bench saturate CPU on hosts,
but give 10x throughput. Have to double check if our config is doing additional
striping/chunking of incoming objects, and how this interacts with EC.

Thanks,
Martin
Attachment:
signature.asc

Description: PGP signature