Re: radosgw performance

Mark Nelson <mark.nelson@xxxxxxxxxxx> · Mon, 18 Feb 2013 07:19:29 -0600

On 02/18/2013 04:53 AM, Erdem Agaoglu wrote:
Answering in reverse order:

We are using mod_fastcgi almost exactly in the documented way.

I have deleted and recreated these pools with the following pg_nums
.rgw.buckets 1024
.rgw 64
.rgw.gc 64
.rgw.control 64
.users.uid 64
.users 64
We currently have 32 OSDs but planning to increase the number 100+ shortly.

Default 'admin socket' /var/run/ceph/$cluster-$name.asok does not exists
for rgw and 'lsof -p <pid_of_rgw>' does not show any.

Correlating logs revealed some interesting stuff though. With 'debug ms
= 1' rgw logs show a few lines more. (summarizing for readability)

01:17:08.371741 TID 20 get_obj_state: rctx=0x7f8624002520 obj=b:key ...
01:17:08.371772 TID  1 -- rgw/1005379 --> osd.19/19507 -- osd_op(cli:316
datafile [getxattrs,stat]...
01:17:08.373125 TID 20 prepare_atomic_for_write_impl: state is not
atomic. state=0x7f8624010378
01:17:08.373241 TID  1 -- rgw/1005379 --> osd.3/7602 -- osd_op(cli:317
.dir.4470.1 [call rgw.bucket_prepare_op]...
01:17:08.741306 TID  1 -- rgw/1005379 --> osd.19/19507 -- osd_op(cli:387
datafile [create 0~0 ...
01:17:08.745096 TID  1 -- rgw/1005379 --> osd.3/7602 -- osd_op(cli:388
.dir.4470.1 [call rgw.bucket_complete_op]...
01:17:08.745175 TID  2 req 58:0.373932:s3:PUT /b/key:put_obj:http status=200

This shows the 0.37 sec PUT operation is almost all bucket_prepare_op
for .dir.4470.1 on osd.3 showing client.6515.0:317 which is shorthened
as cli:317. The relevant parts of debug log on osd.3 goes like (again
summarizing):

01:17:08.379578 TID1  1 -- osd.3/7602 <== client.6515 rgw/1005379 114
==== osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] ...
01:17:08.379666 TID1 15 osd.3 288 enqueue_op ... latency 0.000243
osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] ...
01:17:08.675843 TID2 10 osd.3 288 dequeue_op ... latency 0.296420
osd_op(cli:317 .dir.4470.1 [call rgw.bucket_prepare_op] .. pg[10.11...]
01:17:08.675987 TID2 10 osd.3 pg_epoch: 288 pg[10.11...]  taking
ondisk_read_lock
01:17:08.736329 TID2 10 osd.3 pg_epoch: 288 pg[10.11...] do_osd_op
d91dcc11/.dir.4470.1/head//10 [call rgw.bucket_prepare_op]
01:17:08.736884 TID2  1 -- osd.3/7602 --> osd.26 osd.26/3582 --
osd_sub_op(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ...
01:17:08.738700 TID3  1 -- osd.3/7602 <== osd.26 osd.26/3582 134 ====
osd_sub_op_reply(cli:317 10.11 ...
01:17:08.738783 TID3 15 osd.3 288 enqueue_op ... latency 0.000229
osd_sub_op_reply(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ...
01:17:08.746120 TID2 10 osd.3 288 dequeue_op ... latency 0.007566
osd_sub_op_reply(cli:317 10.11 d91dcc11/.dir.4470.1/head//10 ...
01:17:08.746417 TID4  1 -- osd.3/7602 --> rgw/1005379 --
osd_op_reply(317 .dir.4470.1 [call rgw.bucket_prepare_op] ack = 0) ...

Here shows osd receiving the request from rgw and almost immediately
queuing it. 0.29 secs pass while this operation gets a chance to
actually work. It tries to get a read lock, which causes like a 0.06
second delay. Replication to another osd is super fast and after a few
milliseconds, rgw is notified that the bucket_prepare_op is complete.
All of these makes sense since 'rados bench' does not use any
bucket_prepare or bucket_complete hence faster.

What i read here in an architectural level is that all rgw operations
must pass through .dir.<bucket-id> file, which happens to be a
scalability killer for large buckets. Or buckets with a lot of activity
for that matter.

Out of curiosity, do you see single-bucket performance being a problem 
for your use case?  Sooner or later I'm going to start digging into RGW 
performance again (It looks like you are getting a good head start!). 
I've been wondering about the use cases users have and how much we 
should be worrying about single-bucket vs multi-bucket performance 
issues relative to general RGW performance optimizations.

For a large number of operations, it won't matter how
much OSD's you'd have since this file will always sit in a single pg. Is
that correct? Can changing (increasing) op threads or disk threads make
any difference in that matter? Or is it possible to disable this
operation in any way?

Thanks in advance

On Mon, Feb 18, 2013 at 6:52 AM, Yehuda Sadeh <yehuda@xxxxxxxxxxx
<mailto:yehuda@xxxxxxxxxxx>> wrote:

    On Sun, Feb 17, 2013 at 2:30 PM, Erdem Agaoglu
    <erdem.agaoglu@xxxxxxxxx <mailto:erdem.agaoglu@xxxxxxxxx>> wrote:
     > Hi Yehuda,
     >
     > Thanks for fast reply.
     > We are running 0.56.3. On ubuntu 12.04.
     >
     > I disabled ops log using
     > 'rgw enable ops log = false'
     > but i missed 'debug rgw = 0' so what i did was simply 'log file =
    /dev/null'
     > :)
     >
     > Anyways. i tried 'debug rgw = 0' but it didn't change anything.
    To see
     > what's going on in rgw i did 'debug rgw=2' and rerun the test.
     >
     > # cat radosgw.log |grep put_obj |grep status=200
     > ...
     > 2013-02-17 13:51:37.415981 7f54167f4700  2 req 613:0.327243:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object611:put_obj:http
    status=200
     > 2013-02-17 13:51:37.431779 7f54137ee700  2 req 614:0.318996:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object612:put_obj:http
    status=200
     > 2013-02-17 13:51:37.447688 7f53f37ae700  2 req 615:0.319085:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object613:put_obj:http
    status=200
     > 2013-02-17 13:51:37.460531 7f53fbfbf700  2 req 581:0.887859:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object579:put_obj:http
    status=200
     > 2013-02-17 13:51:37.468215 7f5411feb700  2 req 616:0.326575:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object614:put_obj:http
    status=200
     > 2013-02-17 13:51:37.480233 7f54267fc700  2 req 617:0.335292:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object615:put_obj:http
    status=200
     > 2013-02-17 13:51:37.503042 7f54147f0700  2 req 618:0.330277:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object616:put_obj:http
    status=200
     > 2013-02-17 13:51:37.519647 7f5413fef700  2 req 619:0.306762:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object617:put_obj:http
    status=200
     > 2013-02-17 13:51:37.520274 7f5427fff700  2 req 620:0.307374:s3:PUT
     > /mybucket/benchmark_data_ceph-10_31019_object618:put_obj:http
    status=200
     > ...
     >
     > If i read this correctly requests take 0.32 secs on average.
    Again, if i'm
     > looking at things the right way, that would make 320 secs for
    like 1175
     > requests, and if i divide that by 20 paralel requests that would
    give 18.8
     > secs, which means for a 20 sec test most of the time is passed at
    radosgw
     > <-> rados.

    It would be interesting looking at the messenger log (debug ms = 1)
    and correlate it with the rgw log (debug rgw = 20). You'll need to
    look at a specific request that's slow and filter by the thread id
    (the hex number in the 3rd column). It'll contain most of the
    interesting stuff, except for the response messages.
     >
     > i'm not sure how to check apache <-> radosgw or if that's
    relevant. But i
     > can say there is no problem between rest-client <-> apache as i
    am getting
     > the same throughput with a bunch of other clients.
     >
     > i noticed that earlier but rgw does not have any admin socket, i
    guess i'm
     > running defaults in this matter. Is there an option missing to
    have an admin
     > socket?

    'admin socket', by default points at /var/run/ceph/$cluster-$name.asok.

     >
     > Since i was able to increase rados pool performance but not the
    radosgw
     > performance, the only thing that makes sense to me is that i
    forgot the
     > delete some configuration or something while i delete the
    original .rgw.
     > pools. All i did was 'ceph osd pool delete' is there something
    more to
     > clean?

    there are a bunch of pools there. 'rados lspools' will give you the
    list, everything that starts with .rgw is relevant.

    By the way, what fastcgi module are you using?

    Yehuda

--
erdem agaoglu

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

--
Mark Nelson
Performance Engineer
Inktank
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com