Re: radosgw buffer overflow

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, Nov 14, 2014 at 10:13 PM, Mustafa Muhammad
<mustafaa.alhamdaani@xxxxxxxxx> wrote:
> On Thu, Nov 13, 2014 at 12:34 PM, Mustafa Muhammad
> <mustafaa.alhamdaani@xxxxxxxxx> wrote:
>> On Wed, Nov 12, 2014 at 9:43 PM, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote:
>>> On Tue, Nov 11, 2014 at 5:19 AM, Mustafa Muhammad
>>> <mustafaa.alhamdaani@xxxxxxxxx> wrote:
>>>> On Tue, Nov 11, 2014 at 3:44 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote:
>>>>> Mustafa,
>>>>>
>>>>> You can get rid of these messages by setting your rgw_obj_chunk_size >=
>>>>> 'Object size your are testing with'. It will also increase the performance.
>>>>
>>>> Thank you for answering, I am using multiple objects (ranging from 500
>>>> MBs to 2 GBs), so I put "rgw obj chunk size = 4G" in
>>>> /etc/ceph/ceph.conf? 4G is OK right? and what is the upper limit?
>>>
>>> No, you shouldn't put that much for the chunk size. This will
>>> effectively disable striping, and cause a significant memory
>>> consumption per thread.
>>
>> Ok, thank you.
>> One more think, I am trying to set "rgw thread pool size" more than
>> 1024, at 1024 it works, but anything more than that it doesn't (even
>> 1025).

It sounds to me like an issue with libfcgi (the library that radosgw
links with to connect to apache). There used to be an issue in that
library where it was using select() instead of poll() and didn't
handle correctly more than 1024 fds. Try to check if there's an update
library with a fix for that.

Yehuda

>>
>> I asked in #ceph and #ceph-devel and no answer.
>> Also were can I find civetweb log and configuration?
>>
>> P.S. ulimit for apache is very high, so it is not a problem.
>> Thanks
>> Mustafa
> Ping :)
>
>>>
>>> Yehuda
>>>
>>>>
>>>>>
>>>>> For CivetWeb you just need to set 'rgw_frontends="civetweb port=8080" , you
>>>>> can tune some of rgw_ config with it. I find the most useful one with
>>>>> civetweb is 'rgw_thread_pool_size' which maps to 'num_op_thread' in civetweb
>>>>> configs. I find value of '128' good enough, but you can play around.
>>>> It worked, thank you, but it is very slow comared to apache (but
>>>> lighter) and nginx, after changing object chunk size, it improved alot
>>>> (from about 100 MB/s to about 100 MB/s, still slower that nginx (about
>>>> 150~200 MB/s).
>>>>
>>>>>
>>>>> Yehuda,
>>>>> I didnt find any way to disable access logs in CivetWeb. I set all the
>>>>> *enable_logs parameter to false.
>>>>> I am not able to properly setup fcgi multiple instance on same host, any
>>>>> information would be useful.
>>>>>
>>>>>
>>>>> On Tue, Nov 11, 2014 at 4:40 PM, Mustafa Muhammad
>>>>> <mustafaa.alhamdaani@xxxxxxxxx> wrote:
>>>>>>
>>>>>> On Tue, Nov 11, 2014 at 1:49 AM, Yehuda Sadeh <yehuda@xxxxxxxxxx> wrote:
>>>>>> > On Mon, Nov 10, 2014 at 12:45 PM, Mustafa Muhammad
>>>>>> > <mustafaa.alhamdaani@xxxxxxxxx> wrote:
>>>>>> >> Hi,
>>>>>> >> I am using radosgw to connect to my ceph cluster, I am testing it and
>>>>>> >> with large number of requests, I get:
>>>>>> >> *** buffer overflow detected ***: /bin/radosgw terminated
>>>>>> >> in the syslog.
>>>>>> >> I use CentOS 7, and this is the some of the last lines of the log:
>>>>>> >>
>>>>>> >>  ceph version 0.80.5 (38b73c67d375a2552d8ed67843c8a65c2c0feba6)
>>>>>> >>  1: /bin/radosgw() [0x5daaf6]
>>>>>> >>  2: (()+0xf130) [0x7f177cd4e130]
>>>>>> >>  3: (gsignal()+0x39) [0x7f177bf905c9]
>>>>>> >>  4: (abort()+0x148) [0x7f177bf91cd8]
>>>>>> >>  5: (()+0x75dd7) [0x7f177bfd0dd7]
>>>>>> >>  6: (__fortify_fail()+0x37) [0x7f177c0688f7]
>>>>>> >>  7: (()+0x10bac0) [0x7f177c066ac0]
>>>>>> >>  8: (()+0x10d867) [0x7f177c068867]
>>>>>> >>  9: (OS_Accept()+0xc1) [0x7f177d4a18b1]
>>>>>> >>  10: (FCGX_Accept_r()+0x9c) [0x7f177d49f91c]
>>>>>> >>  11: (RGWFCGXProcess::run()+0x1c8) [0x4c9318]
>>>>>> >>  12: (RGWProcessControlThread::entry()+0xe) [0x4cc25e]
>>>>>> >>  13: (()+0x7df3) [0x7f177cd46df3]
>>>>>> >>  14: (clone()+0x6d) [0x7f177c05101d]
>>>>>> >>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is
>>>>>> >> needed to interpret this.
>>>>>> >>
>>>>>> >> --- logging levels ---
>>>>>> >>    0/ 5 none
>>>>>> >>    0/ 1 lockdep
>>>>>> >>    0/ 1 context
>>>>>> >>    1/ 1 crush
>>>>>> >>    1/ 5 mds
>>>>>> >>    1/ 5 mds_balancer
>>>>>> >>    1/ 5 mds_locker
>>>>>> >>    1/ 5 mds_log
>>>>>> >>    1/ 5 mds_log_expire
>>>>>> >>    1/ 5 mds_migrator
>>>>>> >>    0/ 1 buffer
>>>>>> >>    0/ 1 timer
>>>>>> >>    0/ 1 filer
>>>>>> >>    0/ 1 striper
>>>>>> >>    0/ 1 objecter
>>>>>> >>    0/ 5 rados
>>>>>> >>    0/ 5 rbd
>>>>>> >>    0/ 5 journaler
>>>>>> >>    0/ 5 objectcacher
>>>>>> >>    0/ 5 client
>>>>>> >>    0/ 5 osd
>>>>>> >>    0/ 5 optracker
>>>>>> >>    0/ 5 objclass
>>>>>> >>    1/ 3 filestore
>>>>>> >>    1/ 3 keyvaluestore
>>>>>> >>    1/ 3 journal
>>>>>> >>    0/ 5 ms
>>>>>> >>    1/ 5 mon
>>>>>> >>    0/10 monc
>>>>>> >>    1/ 5 paxos
>>>>>> >>    0/ 5 tp
>>>>>> >>    1/ 5 auth
>>>>>> >>    1/ 5 crypto
>>>>>> >>    1/ 1 finisher
>>>>>> >>    1/ 5 heartbeatmap
>>>>>> >>    1/ 5 perfcounter
>>>>>> >>    1/ 5 rgw
>>>>>> >>    1/ 5 javaclient
>>>>>> >>    1/ 5 asok
>>>>>> >>    1/ 1 throttle
>>>>>> >>   -2/-2 (syslog threshold)
>>>>>> >>   -1/-1 (stderr threshold)
>>>>>> >>   max_recent     10000
>>>>>> >>   max_new         1000
>>>>>> >>   log_file /var/log/ceph/radosgw.log
>>>>>> >> --- end dump of recent events ---
>>>>>> >
>>>>>> > This might be an issue with the fastcgi library that radosgw uses (not
>>>>>> > sure which one and what version is used in centos 7). How many
>>>>>> > concurrent requests does it handle when it fails? You can try testing
>>>>>> > it with the standalone web server (civetweb), see how it behaves.
>>>>>> I think I am using fcgi 2.4.0, I use nginx with "fastcgi_buffering
>>>>>> off;" so it doesn't touch the disks.
>>>>>> Somtimes it handles 4000 connections, sometimes 1000.
>>>>>> I want to test civetweb but couldn't find any info about how to do so,
>>>>>> can you please give me a link to docs or something.
>>>>>>
>>>>>> Thank you.
>>>>>>
>>>>>> Mustafa
>>>>>> >>
>>>>>> >> P.S. I get lots of errors like:
>>>>>> >> RGWObjManifest::operator++(): result: ofs=20971520 stripe_ofs=20971520
>>>>>> >> part_ofs=0 rule->part_size=104857600
>>>>>> >
>>>>>> > This is just a too verbose log message, not necessarily pointing at
>>>>>> > anything wrong.
>>>>>> >
>>>>>> > Thanks,
>>>>>> > Yehuda
>>>>>> --
>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> -Pushpesh
>>>>>
>>>>
>>>>
>>>>
>>>> Now about the buffer overflow, should I file a bug?
>>>> --
>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux