Hi Mark, On Mon, Jan 30, 2017 at 06:09:44PM -0600, Mark Nelson wrote: > > > On 01/30/2017 05:56 PM, Martin Millnert wrote: > > Hi, > > > > we're running RGW with indexes placed on fast pools, whereas the data is > > placed on slow pools. The write throughput is approximately 10% of rados > > bench and I guess we're experiencing some locking/syncronization > > feature, because parallelisation doesn't really speed it up. > > I've seen the sharding option, and the indexless option, but neither one > > of these seems like /the/ fix to me. > > My limited knowledge of the RGW code makes me guess it's due to the > > indexes, and possibly even additional metadata that RGW keeps > > up-to-date? > > > > Assuming that the index objectes must be downloaded, rewritten, and > > re-uploaded to the index pools by RGW (and that this should be > > locking?), the thought that I've had for a while now is: > > How hard is it to abstract these index operations and add support for > > actually using PostgreSQL as a backend? > > The indexes are definitely something to watch! It would be very interesting > to see how your write benchmark does with indexless buckets. Yeah we'll try to get some comparative data to try to find the bottleneck(s), though what I want is fast indexes. :-) > we've seen in the past is when the bucket indexes are slow (in that > particular case because the bucket index pool only had 8 PGs!), RGW is > spending most of it's time waiting on bucket index updates. You can grab a > couple of GDB stacktraces of the RGW process to see if this is happening in > your case. Thanks for the pointer, I'll see what we can do. > > I don't mean to say bad words vis-a-vis a completely self-contained data > > management system, etc, but (more) right(*) (optional) tool for the job has some > > quality to it too. :-) > > > > I'd be willing to take a look at it, I guess. > > > > Thoughts? > > I think first figure out if it's actually the bucket indexes. Sure, data first makes very good sense. :-) > Like you said, there's more metadata associated with RGW, so at least some > of what you are seeing could be metadata getting pushed out of the inode and > into separate extents. Are you writing out lots of really small objects or fewer > larger objects? We're throughput limited when uploading single chunk or multi chunk large objects (10+ MB), this is the main concern. Data pool is on a wide EC (10+4), where rados bench saturate CPU on hosts, but give 10x throughput. Have to double check if our config is doing additional striping/chunking of incoming objects, and how this interacts with EC. Thanks, Martin
Attachment:
signature.asc
Description: PGP signature