Re: radosgw performance

Yann ROBIN <yann.robin@xxxxxxxxxxxxx> · Mon, 18 Feb 2013 13:50:07 +0000

Hi,

We use radosgw to store millions of HTML/css/fonts/imgs and we currently use one bucket by file type.
We are clearly interested in single-bucket performance.
We noticed that under heavy write load the radosgw is slow to respond. We didn't do any performance analysis as our main concern is read speed.

My 2 cents on the subject ! 

-- 
Yann ROBIN
YouScribe

-----Message d'origine-----
De : ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] De la part de Mark Nelson
Envoyé : lundi 18 février 2013 14:19
À : Erdem Agaoglu
Cc : ceph-users@xxxxxxxxxxxxxx
Objet : Re:  radosgw performance

On 02/18/2013 04:53 AM, Erdem Agaoglu wrote:
> Answering in reverse order:
>
> We are using mod_fastcgi almost exactly in the documented way.
>
> I have deleted and recreated these pools with the following pg_nums 
> .rgw.buckets 1024 .rgw 64 .rgw.gc 64 .rgw.control 64 .users.uid 64 
> .users 64 We currently have 32 OSDs but planning to increase the 
> number 100+ shortly.
>
> Default 'admin socket' /var/run/ceph/$cluster-$name.asok does not 
> exists for rgw and 'lsof -p <pid_of_rgw>' does not show any.
>
> Correlating logs revealed some interesting stuff though. With 'debug 
> ms = 1' rgw logs show a few lines more. (summarizing for readability)
>
> 01:17:08.371741 TID 20 get_obj_state: rctx=0x7f8624002520 obj=b:key ...
> 01:17:08.371772 TID  1 -- rgw/1005379 --> osd.19/19507 -- 
> osd_op(cli:316 datafile [getxattrs,stat]...
> 01:17:08.373125 TID 20 prepare_atomic_for_write_impl: state is not 
> atomic. state=0x7f8624010378
> 01:17:08.373241 TID  1 -- rgw/1005379 --> osd.3/7602 -- osd_op(cli:317
> .dir.4470.1 [call rgw.bucket_prepare_op]...
> 01:17:08.741306 TID  1 -- rgw/1005379 --> osd.19/19507 -- 
> osd_op(cli:387 datafile [create 0~0 ...
> 01:17:08.745096 TID  1 -- rgw/1005379 --> osd.3/7602 -- osd_op(cli:388
> .dir.4470.1 [call rgw.bucket_complete_op]...
> 01:17:08.745175 TID  2 req 58:0.373932:s3:PUT /b/key:put_obj:http 
> status=200
>
> This shows the 0.37 sec PUT operation is almost all bucket_prepare_op 
> for .dir.4470.1 on osd.3 showing client.6515.0:317 which is shorthened 
> as cli:317. The relevant parts of debug log on osd.3 goes like (again
> summarizing):
>
> 01:17:08.379578 TID1  1 -- osd.3/7602 <== client.6515 rgw/1005379 114 
> ==== osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] ...
> 01:17:08.379666 TID1 15 osd.3 288 enqueue_op ... latency 0.000243
> osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] ...
> 01:17:08.675843 TID2 10 osd.3 288 dequeue_op ... latency 0.296420
> osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] .. 
> pg[10.11...]
> 01:17:08.675987 TID2 10 osd.3 pg_epoch: 288 pg[10.11...]  taking 
> ondisk_read_lock
> 01:17:08.736329 TID2 10 osd.3 pg_epoch: 288 pg[10.11...] do_osd_op
> d91dcc11/.dir.4470.1/head//10 [call rgw.bucket_prepare_op]
> 01:17:08.736884 TID2  1 -- osd.3/7602 --> osd.26 osd.26/3582 --
> osd_sub_op(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ...
> 01:17:08.738700 TID3  1 -- osd.3/7602 <== osd.26 osd.26/3582 134 ====
> osd_sub_op_reply(cli:317 10.11 ...
> 01:17:08.738783 TID3 15 osd.3 288 enqueue_op ... latency 0.000229
> osd_sub_op_reply(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ...
> 01:17:08.746120 TID2 10 osd.3 288 dequeue_op ... latency 0.007566
> osd_sub_op_reply(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ...
> 01:17:08.746417 TID4  1 -- osd.3/7602 --> rgw/1005379 --
> osd_op_reply(317 .dir.4470.1 [call rgw.bucket_prepare_op] ack = 0) ...
>
> Here shows osd receiving the request from rgw and almost immediately 
> queuing it. 0.29 secs pass while this operation gets a chance to 
> actually work. It tries to get a read lock, which causes like a 0.06 
> second delay. Replication to another osd is super fast and after a few 
> milliseconds, rgw is notified that the bucket_prepare_op is complete.
> All of these makes sense since 'rados bench' does not use any 
> bucket_prepare or bucket_complete hence faster.
>
> What i read here in an architectural level is that all rgw operations 
> must pass through .dir.<bucket-id> file, which happens to be a 
> scalability killer for large buckets. Or buckets with a lot of 
> activity for that matter.

Out of curiosity, do you see single-bucket performance being a problem for your use case?  Sooner or later I'm going to start digging into RGW performance again (It looks like you are getting a good head start!). 
I've been wondering about the use cases users have and how much we should be worrying about single-bucket vs multi-bucket performance issues relative to general RGW performance optimizations.

> For a large number of operations, it won't matter how much OSD's you'd 
> have since this file will always sit in a single pg. Is that correct? 
> Can changing (increasing) op threads or disk threads make any 
> difference in that matter? Or is it possible to disable this operation 
> in any way?
>
> Thanks in advance
>
>
>
> On Mon, Feb 18, 2013 at 6:52 AM, Yehuda Sadeh <yehuda@xxxxxxxxxxx 
> <mailto:yehuda@xxxxxxxxxxx>> wrote:
>
>     On Sun, Feb 17, 2013 at 2:30 PM, Erdem Agaoglu
>     <erdem.agaoglu@xxxxxxxxx <mailto:erdem.agaoglu@xxxxxxxxx>> wrote:
>      > Hi Yehuda,
>      >
>      > Thanks for fast reply.
>      > We are running 0.56.3. On ubuntu 12.04.
>      >
>      > I disabled ops log using
>      > 'rgw enable ops log = false'
>      > but i missed 'debug rgw = 0' so what i did was simply 'log file =
>     /dev/null'
>      > :)
>      >
>      > Anyways. i tried 'debug rgw = 0' but it didn't change anything.
>     To see
>      > what's going on in rgw i did 'debug rgw=2' and rerun the test.
>      >
>      > # cat radosgw.log |grep put_obj |grep status=200
>      > ...
>      > 2013-02-17 13:51:37.415981 7f54167f4700  2 req 613:0.327243:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object611:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.431779 7f54137ee700  2 req 614:0.318996:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object612:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.447688 7f53f37ae700  2 req 615:0.319085:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object613:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.460531 7f53fbfbf700  2 req 581:0.887859:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object579:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.468215 7f5411feb700  2 req 616:0.326575:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object614:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.480233 7f54267fc700  2 req 617:0.335292:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object615:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.503042 7f54147f0700  2 req 618:0.330277:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object616:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.519647 7f5413fef700  2 req 619:0.306762:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object617:put_obj:http
>     status=200
>      > 2013-02-17 13:51:37.520274 7f5427fff700  2 req 620:0.307374:s3:PUT
>      > /mybucket/benchmark_data_ceph-10_31019_object618:put_obj:http
>     status=200
>      > ...
>      >
>      > If i read this correctly requests take 0.32 secs on average.
>     Again, if i'm
>      > looking at things the right way, that would make 320 secs for
>     like 1175
>      > requests, and if i divide that by 20 paralel requests that would
>     give 18.8
>      > secs, which means for a 20 sec test most of the time is passed at
>     radosgw
>      > <-> rados.
>
>     It would be interesting looking at the messenger log (debug ms = 1)
>     and correlate it with the rgw log (debug rgw = 20). You'll need to
>     look at a specific request that's slow and filter by the thread id
>     (the hex number in the 3rd column). It'll contain most of the
>     interesting stuff, except for the response messages.
>      >
>      > i'm not sure how to check apache <-> radosgw or if that's
>     relevant. But i
>      > can say there is no problem between rest-client <-> apache as i
>     am getting
>      > the same throughput with a bunch of other clients.
>      >
>      > i noticed that earlier but rgw does not have any admin socket, i
>     guess i'm
>      > running defaults in this matter. Is there an option missing to
>     have an admin
>      > socket?
>
>     'admin socket', by default points at /var/run/ceph/$cluster-$name.asok.
>
>      >
>      > Since i was able to increase rados pool performance but not the
>     radosgw
>      > performance, the only thing that makes sense to me is that i
>     forgot the
>      > delete some configuration or something while i delete the
>     original .rgw.
>      > pools. All i did was 'ceph osd pool delete' is there something
>     more to
>      > clean?
>
>     there are a bunch of pools there. 'rados lspools' will give you the
>     list, everything that starts with .rgw is relevant.
>
>     By the way, what fastcgi module are you using?
>
>     Yehuda
>
>
>
>
> --
> erdem agaoglu
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Mark Nelson
Performance Engineer
Inktank
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com