Segmentation fault RadosGW

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

i?m trying to backup hdfs to ceph/radosgw/s3, but I run into different problems. Currently I?m fighting against an segfault of radosgw.

Some details about my setup:

* nginx, because apache2 isn?t returning an ?content-length: 0? header on head as required by hadoop (http://tracker.ceph.com/issues/897)
* fastcgi-setup as in howto (*.s3.domain.com)
* hadoop is using s3.domain.com and http-only.

?
2014-05-15 17:08:32.288974 7fc00efdd700  0 WARNING: couldn't find acl header for bucket, generating default
2014-05-15 17:08:32.291511 7fc00efdd700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fc00efdd700

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: /usr/bin/radosgw() [0x5c4f4a]
 2: (()+0xfcb0) [0x7fc11aed1cb0]
 3: (()+0x9184e) [0x7fc11a17d84e]
 4: (ceph::buffer::ptr::append(char const*, unsigned int)+0x43) [0x7fc11bee4f43]
 5: (ceph::buffer::list::append(char const*, unsigned int)+0x91) [0x7fc11bee7681]
 6: (RGWRados::copy_obj_data(void*, std::string const&, void**, long, rgw_obj&, rgw_obj&, long*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*)+0x59e) [0x5247de]
 7: (RGWRados::copy_obj(void*, std::string const&, std::string const&, std::string const&, req_info*, std::string const&, rgw_obj&, rgw_obj&, RGWBucketInfo&, RGWBucketInfo&, long*, long const*, long const*, char const*, char const*, bool, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*, void (*)(long, void*), void*)+0x1dc2) [0x532f32]
 8: (RGWCopyObj::execute()+0x2bc) [0x555efc]
 9: /usr/bin/radosgw() [0x4c7a5c]
 10: (RGWFCGXProcess::handle_request(RGWRequest*)+0x9c) [0x4c873c]
 11: (RGWProcess::RGWWQ::_process(RGWRequest*)+0x37) [0x4c9827]
 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x7fc11beccd86]
 13: (ThreadPool::WorkThread::entry()+0x10) [0x7fc11beceb90]
 14: (()+0x7e9a) [0x7fc11aec9e9a]
 15: (clone()+0x6d) [0x7fc11a1e03fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- begin dump of recent events ---
-10000> 2014-05-15 17:02:01.616519 7fc037f3f700  2 req 1133:0.000202:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:reading permissions
 -9999> 2014-05-15 17:02:01.616636 7fc037f3f700  1 -- 10.0.16.101:0/1055174 --> 10.0.16.102:6855/33477 -- osd_op(client.4275.0:31662 default.4270.1_/_distcp_logs_z6cahx [getxattrs,stat,read 0~524288] 11.68ce2873 ack+read e146) v4 -- ?+0 0x7fc0c001c4e0 con 0x7fc0b801d650
 -9998> 2014-05-15 17:02:01.619689 7fc04bf67700  1 ====== starting new request req=0x7fc0981b8630 =====
 -9997> 2014-05-15 17:02:01.619709 7fc04bf67700  2 req 1134:0.000020::PUT /hadoop-backup-profile/block_-7196675373495747436::initializing
 -9996> 2014-05-15 17:02:01.619739 7fc04bf67700  2 req 1134:0.000050:s3:PUT /hadoop-backup-profile/block_-7196675373495747436::getting op
 -9995> 2014-05-15 17:02:01.619721 7fc1134ee700  1 -- 10.0.16.101:0/1055174 <== osd.23 10.0.16.102:6855/33477 2227 ==== osd_op_reply(31662 default.4270.1_/_distcp_logs_z6cahx [getxattrs,stat,read 0~1] v0'0 uv626 ondisk = 0) v6 ==== 286+0+919 (384034339 0 2521691918) 0x7fc0d87c4b00 con 0x7fc0b801d650
 -9994> 2014-05-15 17:02:01.619746 7fc04bf67700  2 req 1134:0.000057:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:authorizing
 -9993> 2014-05-15 17:02:01.619823 7fc037f3f700  2 req 1133:0.003506:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:init op
 -9992> 2014-05-15 17:02:01.619834 7fc037f3f700  2 req 1133:0.003517:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:verifying op mask
 -9991> 2014-05-15 17:02:01.619837 7fc037f3f700  2 req 1133:0.003520:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:verifying op permissions
 -9990> 2014-05-15 17:02:01.619842 7fc037f3f700  5 Searching permissions for uid=hadoop-backup mask=49
 -9989> 2014-05-15 17:02:01.619843 7fc037f3f700  5 Found permission: 15
 -9988> 2014-05-15 17:02:01.619845 7fc037f3f700  5 Searching permissions for group=1 mask=49
 -9987> 2014-05-15 17:02:01.619847 7fc037f3f700  5 Permissions for group not found
 -9986> 2014-05-15 17:02:01.619848 7fc037f3f700  5 Searching permissions for group=2 mask=49
 -9985> 2014-05-15 17:02:01.619850 7fc037f3f700  5 Permissions for group not found
 -9984> 2014-05-15 17:02:01.619851 7fc037f3f700  5 Getting permissions id=hadoop-backup owner=hadoop-backup perm=1
 -9983> 2014-05-15 17:02:01.619852 7fc037f3f700  2 req 1133:0.003535:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:verifying op params
 -9982> 2014-05-15 17:02:01.619857 7fc037f3f700  2 req 1133:0.003539:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:executing
 -9981> 2014-05-15 17:02:01.619885 7fc04bf67700  2 req 1134:0.000196:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:reading permissions
 -9980> 2014-05-15 17:02:01.619918 7fc037f3f700  2 req 1133:0.003600:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:http status=200
 -9979> 2014-05-15 17:02:01.619925 7fc037f3f700  1 ====== req done req=0x7fc0981c6c30 http_status=200 ======
 -9978> 2014-05-15 17:02:01.619952 7fc04bf67700  2 req 1134:0.000263:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:init op
 -9977> 2014-05-15 17:02:01.619959 7fc04bf67700  2 req 1134:0.000270:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:verifying op mask
 -9976> 2014-05-15 17:02:01.619962 7fc04bf67700  2 req 1134:0.000273:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:verifying op permissions
 -9975> 2014-05-15 17:02:01.619965 7fc04bf67700  5 Searching permissions for uid=hadoop-backup mask=50
?
..around 9965 limes later ...
?
   -10> 2014-05-15 17:08:32.289014 7fc00efdd700  5 Searching permissions for group=2 mask=50
    -9> 2014-05-15 17:08:32.289017 7fc00efdd700  5 Permissions for group not found
    -8> 2014-05-15 17:08:32.289018 7fc00efdd700  5 Getting permissions id=hadoop-backup owner=hadoop-backup perm=2
    -7> 2014-05-15 17:08:32.289027 7fc00efdd700  2 req 1534:0.001177:s3:PUT /hadoop-backup-profile/_distcp_logs_bedoz8%2Fpart-00008:copy_obj:verifying op params
    -6> 2014-05-15 17:08:32.289031 7fc00efdd700  2 req 1534:0.001182:s3:PUT /hadoop-backup-profile/_distcp_logs_bedoz8%2Fpart-00008:copy_obj:executing
    -5> 2014-05-15 17:08:32.289084 7fc00efdd700  5 Copy object hadoop-backup-profile(@{i=.rgw.buckets.index}.rgw.buckets[default.4270.1]):__distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 => hadoop-backup-profile(@{i=.rgw.buckets.index}.rgw
.buckets[default.4270.1]):__distcp_logs_bedoz8/part-00008
    -4> 2014-05-15 17:08:32.289111 7fc00efdd700  1 -- 10.0.16.101:0/1055174 --> 10.0.16.101:6800/63927 -- osd_op(client.4275.0:33009 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 @11:default.4270.1__distcp_logs_bedoz8/_te
mporary/_attempt_201405121058_39180_m_000008_0/part-00008 [getxattrs,stat] 11.9b035c33 ack+read e146) v4 -- ?+0 0x7fc0ec12c590 con 0x7fc0f801ca30
    -3> 2014-05-15 17:08:32.289712 7fc1134ee700  1 -- 10.0.16.101:0/1055174 <== osd.0 10.0.16.101:6800/63927 3160 ==== osd_op_reply(33009 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 [getxattrs,stat] v0'0 uv513 ondisk =
0) v6 ==== 305+0+992 (2899522085 0 307831001) 0x7fc0ec020010 con 0x7fc0f801ca30
    -2> 2014-05-15 17:08:32.289848 7fc00efdd700  1 -- 10.0.16.101:0/1055174 --> 10.0.16.101:6800/63927 -- osd_op(client.4275.0:33010 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 @11:default.4270.1__distcp_logs_bedoz8/_te
mporary/_attempt_201405121058_39180_m_000008_0/part-00008 [cmpxattr user.rgw.idtag (18) op 1 mode 1,read 0~0] 11.9b035c33 ack+read e146) v4 -- ?+0 0x7fc0ec130b00 con 0x7fc0f801ca30
    -1> 2014-05-15 17:08:32.290330 7fc1134ee700  1 -- 10.0.16.101:0/1055174 <== osd.0 10.0.16.101:6800/63927 3161 ==== osd_op_reply(33010 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 [cmpxattr (18) op 1 mode 1,read 0~0]
v0'0 uv513 ondisk = 1) v6 ==== 305+0+0 (1026422808 0 0) 0x7fc0ec130b00 con 0x7fc0f801ca30
     0> 2014-05-15 17:08:32.291511 7fc00efdd700 -1 *** Caught signal (Segmentation fault) **
 in thread 7fc00efdd700

 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74)
 1: /usr/bin/radosgw() [0x5c4f4a]
 2: (()+0xfcb0) [0x7fc11aed1cb0]
 3: (()+0x9184e) [0x7fc11a17d84e]
 4: (ceph::buffer::ptr::append(char const*, unsigned int)+0x43) [0x7fc11bee4f43]
 5: (ceph::buffer::list::append(char const*, unsigned int)+0x91) [0x7fc11bee7681]
 6: (RGWRados::copy_obj_data(void*, std::string const&, void**, long, rgw_obj&, rgw_obj&, long*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*)+0x59e) [0x5247de]
 7: (RGWRados::copy_obj(void*, std::string const&, std::string const&, std::string const&, req_info*, std::string const&, rgw_obj&, rgw_obj&, RGWBucketInfo&, RGWBucketInfo&, long*, long const*, long const*, char const*, char const*, bool, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*, void (*)(long, void*), void*)+0x1dc2) [0x532f32]
 8: (RGWCopyObj::execute()+0x2bc) [0x555efc]
 9: /usr/bin/radosgw() [0x4c7a5c]
 10: (RGWFCGXProcess::handle_request(RGWRequest*)+0x9c) [0x4c873c]
 11: (RGWProcess::RGWWQ::_process(RGWRequest*)+0x37) [0x4c9827]
 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x7fc11beccd86]
 13: (ThreadPool::WorkThread::entry()+0x10) [0x7fc11beceb90]
 14: (()+0x7e9a) [0x7fc11aec9e9a]
 15: (clone()+0x6d) [0x7fc11a1e03fd]
 NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.

--- logging levels ---
   0/ 5 none
   0/ 1 lockdep
   0/ 1 context
   1/ 1 crush
   1/ 5 mds
   1/ 5 mds_balancer
   1/ 5 mds_locker
   1/ 5 mds_log
   1/ 5 mds_log_expire
   1/ 5 mds_migrator
   0/ 1 buffer
   0/ 1 timer
   0/ 1 filer
   0/ 1 striper
   0/ 1 objecter
   0/ 5 rados
   0/ 5 rbd
   0/ 5 journaler
   0/ 5 objectcacher
   0/ 5 client
   0/ 5 osd
   0/ 5 optracker
   0/ 5 objclass
   1/ 3 filestore
   1/ 3 keyvaluestore
   1/ 3 journal
   0/ 5 ms
   1/ 5 mon
   0/10 monc
   1/ 5 paxos
   0/ 5 tp
   1/ 5 auth
   1/ 5 crypto
   1/ 1 finisher
   1/ 5 heartbeatmap
   1/ 5 perfcounter
   1/ 5 rgw
   1/ 5 javaclient
   1/ 5 asok
   1/ 1 throttle
  -2/-2 (syslog threshold)
  -1/-1 (stderr threshold)
  max_recent     10000
  max_new         1000
  log_file /var/log/ceph/client.radosgw.ceph1.log
--- end dump of recent events ?



Any ideas/suggestions how to debug/fix this?

Thanks a lot,

Fabian

-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 496 bytes
Desc: Message signed with OpenPGP using GPGMail
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140515/e968268a/attachment.pgp>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux