Hi, We use radosgw to store millions of HTML/css/fonts/imgs and we currently use one bucket by file type. We are clearly interested in single-bucket performance. We noticed that under heavy write load the radosgw is slow to respond. We didn't do any performance analysis as our main concern is read speed. My 2 cents on the subject ! -- Yann ROBIN YouScribe -----Message d'origine----- De : ceph-users-bounces@xxxxxxxxxxxxxx [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] De la part de Mark Nelson Envoyé : lundi 18 février 2013 14:19 À : Erdem Agaoglu Cc : ceph-users@xxxxxxxxxxxxxx Objet : Re: radosgw performance On 02/18/2013 04:53 AM, Erdem Agaoglu wrote: > Answering in reverse order: > > We are using mod_fastcgi almost exactly in the documented way. > > I have deleted and recreated these pools with the following pg_nums > .rgw.buckets 1024 .rgw 64 .rgw.gc 64 .rgw.control 64 .users.uid 64 > .users 64 We currently have 32 OSDs but planning to increase the > number 100+ shortly. > > Default 'admin socket' /var/run/ceph/$cluster-$name.asok does not > exists for rgw and 'lsof -p <pid_of_rgw>' does not show any. > > Correlating logs revealed some interesting stuff though. With 'debug > ms = 1' rgw logs show a few lines more. (summarizing for readability) > > 01:17:08.371741 TID 20 get_obj_state: rctx=0x7f8624002520 obj=b:key ... > 01:17:08.371772 TID 1 -- rgw/1005379 --> osd.19/19507 -- > osd_op(cli:316 datafile [getxattrs,stat]... > 01:17:08.373125 TID 20 prepare_atomic_for_write_impl: state is not > atomic. state=0x7f8624010378 > 01:17:08.373241 TID 1 -- rgw/1005379 --> osd.3/7602 -- osd_op(cli:317 > .dir.4470.1 [call rgw.bucket_prepare_op]... > 01:17:08.741306 TID 1 -- rgw/1005379 --> osd.19/19507 -- > osd_op(cli:387 datafile [create 0~0 ... > 01:17:08.745096 TID 1 -- rgw/1005379 --> osd.3/7602 -- osd_op(cli:388 > .dir.4470.1 [call rgw.bucket_complete_op]... > 01:17:08.745175 TID 2 req 58:0.373932:s3:PUT /b/key:put_obj:http > status=200 > > This shows the 0.37 sec PUT operation is almost all bucket_prepare_op > for .dir.4470.1 on osd.3 showing client.6515.0:317 which is shorthened > as cli:317. The relevant parts of debug log on osd.3 goes like (again > summarizing): > > 01:17:08.379578 TID1 1 -- osd.3/7602 <== client.6515 rgw/1005379 114 > ==== osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] ... > 01:17:08.379666 TID1 15 osd.3 288 enqueue_op ... latency 0.000243 > osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] ... > 01:17:08.675843 TID2 10 osd.3 288 dequeue_op ... latency 0.296420 > osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] .. > pg[10.11...] > 01:17:08.675987 TID2 10 osd.3 pg_epoch: 288 pg[10.11...] taking > ondisk_read_lock > 01:17:08.736329 TID2 10 osd.3 pg_epoch: 288 pg[10.11...] do_osd_op > d91dcc11/.dir.4470.1/head//10 [call rgw.bucket_prepare_op] > 01:17:08.736884 TID2 1 -- osd.3/7602 --> osd.26 osd.26/3582 -- > osd_sub_op(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ... > 01:17:08.738700 TID3 1 -- osd.3/7602 <== osd.26 osd.26/3582 134 ==== > osd_sub_op_reply(cli:317 10.11 ... > 01:17:08.738783 TID3 15 osd.3 288 enqueue_op ... latency 0.000229 > osd_sub_op_reply(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ... > 01:17:08.746120 TID2 10 osd.3 288 dequeue_op ... latency 0.007566 > osd_sub_op_reply(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ... > 01:17:08.746417 TID4 1 -- osd.3/7602 --> rgw/1005379 -- > osd_op_reply(317 .dir.4470.1 [call rgw.bucket_prepare_op] ack = 0) ... > > Here shows osd receiving the request from rgw and almost immediately > queuing it. 0.29 secs pass while this operation gets a chance to > actually work. It tries to get a read lock, which causes like a 0.06 > second delay. Replication to another osd is super fast and after a few > milliseconds, rgw is notified that the bucket_prepare_op is complete. > All of these makes sense since 'rados bench' does not use any > bucket_prepare or bucket_complete hence faster. > > What i read here in an architectural level is that all rgw operations > must pass through .dir.<bucket-id> file, which happens to be a > scalability killer for large buckets. Or buckets with a lot of > activity for that matter. Out of curiosity, do you see single-bucket performance being a problem for your use case? Sooner or later I'm going to start digging into RGW performance again (It looks like you are getting a good head start!). I've been wondering about the use cases users have and how much we should be worrying about single-bucket vs multi-bucket performance issues relative to general RGW performance optimizations. > For a large number of operations, it won't matter how much OSD's you'd > have since this file will always sit in a single pg. Is that correct? > Can changing (increasing) op threads or disk threads make any > difference in that matter? Or is it possible to disable this operation > in any way? > > Thanks in advance > > > > On Mon, Feb 18, 2013 at 6:52 AM, Yehuda Sadeh <yehuda@xxxxxxxxxxx > <mailto:yehuda@xxxxxxxxxxx>> wrote: > > On Sun, Feb 17, 2013 at 2:30 PM, Erdem Agaoglu > <erdem.agaoglu@xxxxxxxxx <mailto:erdem.agaoglu@xxxxxxxxx>> wrote: > > Hi Yehuda, > > > > Thanks for fast reply. > > We are running 0.56.3. On ubuntu 12.04. > > > > I disabled ops log using > > 'rgw enable ops log = false' > > but i missed 'debug rgw = 0' so what i did was simply 'log file = > /dev/null' > > :) > > > > Anyways. i tried 'debug rgw = 0' but it didn't change anything. > To see > > what's going on in rgw i did 'debug rgw=2' and rerun the test. > > > > # cat radosgw.log |grep put_obj |grep status=200 > > ... > > 2013-02-17 13:51:37.415981 7f54167f4700 2 req 613:0.327243:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object611:put_obj:http > status=200 > > 2013-02-17 13:51:37.431779 7f54137ee700 2 req 614:0.318996:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object612:put_obj:http > status=200 > > 2013-02-17 13:51:37.447688 7f53f37ae700 2 req 615:0.319085:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object613:put_obj:http > status=200 > > 2013-02-17 13:51:37.460531 7f53fbfbf700 2 req 581:0.887859:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object579:put_obj:http > status=200 > > 2013-02-17 13:51:37.468215 7f5411feb700 2 req 616:0.326575:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object614:put_obj:http > status=200 > > 2013-02-17 13:51:37.480233 7f54267fc700 2 req 617:0.335292:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object615:put_obj:http > status=200 > > 2013-02-17 13:51:37.503042 7f54147f0700 2 req 618:0.330277:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object616:put_obj:http > status=200 > > 2013-02-17 13:51:37.519647 7f5413fef700 2 req 619:0.306762:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object617:put_obj:http > status=200 > > 2013-02-17 13:51:37.520274 7f5427fff700 2 req 620:0.307374:s3:PUT > > /mybucket/benchmark_data_ceph-10_31019_object618:put_obj:http > status=200 > > ... > > > > If i read this correctly requests take 0.32 secs on average. > Again, if i'm > > looking at things the right way, that would make 320 secs for > like 1175 > > requests, and if i divide that by 20 paralel requests that would > give 18.8 > > secs, which means for a 20 sec test most of the time is passed at > radosgw > > <-> rados. > > It would be interesting looking at the messenger log (debug ms = 1) > and correlate it with the rgw log (debug rgw = 20). You'll need to > look at a specific request that's slow and filter by the thread id > (the hex number in the 3rd column). It'll contain most of the > interesting stuff, except for the response messages. > > > > i'm not sure how to check apache <-> radosgw or if that's > relevant. But i > > can say there is no problem between rest-client <-> apache as i > am getting > > the same throughput with a bunch of other clients. > > > > i noticed that earlier but rgw does not have any admin socket, i > guess i'm > > running defaults in this matter. Is there an option missing to > have an admin > > socket? > > 'admin socket', by default points at /var/run/ceph/$cluster-$name.asok. > > > > > Since i was able to increase rados pool performance but not the > radosgw > > performance, the only thing that makes sense to me is that i > forgot the > > delete some configuration or something while i delete the > original .rgw. > > pools. All i did was 'ceph osd pool delete' is there something > more to > > clean? > > there are a bunch of pools there. 'rados lspools' will give you the > list, everything that starts with .rgw is relevant. > > By the way, what fastcgi module are you using? > > Yehuda > > > > > -- > erdem agaoglu > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com -- Mark Nelson Performance Engineer Inktank _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com