Hello, i?m trying to backup hdfs to ceph/radosgw/s3, but I run into different problems. Currently I?m fighting against an segfault of radosgw. Some details about my setup: * nginx, because apache2 isn?t returning an ?content-length: 0? header on head as required by hadoop (http://tracker.ceph.com/issues/897) * fastcgi-setup as in howto (*.s3.domain.com) * hadoop is using s3.domain.com and http-only. ? 2014-05-15 17:08:32.288974 7fc00efdd700 0 WARNING: couldn't find acl header for bucket, generating default 2014-05-15 17:08:32.291511 7fc00efdd700 -1 *** Caught signal (Segmentation fault) ** in thread 7fc00efdd700 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: /usr/bin/radosgw() [0x5c4f4a] 2: (()+0xfcb0) [0x7fc11aed1cb0] 3: (()+0x9184e) [0x7fc11a17d84e] 4: (ceph::buffer::ptr::append(char const*, unsigned int)+0x43) [0x7fc11bee4f43] 5: (ceph::buffer::list::append(char const*, unsigned int)+0x91) [0x7fc11bee7681] 6: (RGWRados::copy_obj_data(void*, std::string const&, void**, long, rgw_obj&, rgw_obj&, long*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*)+0x59e) [0x5247de] 7: (RGWRados::copy_obj(void*, std::string const&, std::string const&, std::string const&, req_info*, std::string const&, rgw_obj&, rgw_obj&, RGWBucketInfo&, RGWBucketInfo&, long*, long const*, long const*, char const*, char const*, bool, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*, void (*)(long, void*), void*)+0x1dc2) [0x532f32] 8: (RGWCopyObj::execute()+0x2bc) [0x555efc] 9: /usr/bin/radosgw() [0x4c7a5c] 10: (RGWFCGXProcess::handle_request(RGWRequest*)+0x9c) [0x4c873c] 11: (RGWProcess::RGWWQ::_process(RGWRequest*)+0x37) [0x4c9827] 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x7fc11beccd86] 13: (ThreadPool::WorkThread::entry()+0x10) [0x7fc11beceb90] 14: (()+0x7e9a) [0x7fc11aec9e9a] 15: (clone()+0x6d) [0x7fc11a1e03fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- begin dump of recent events --- -10000> 2014-05-15 17:02:01.616519 7fc037f3f700 2 req 1133:0.000202:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:reading permissions -9999> 2014-05-15 17:02:01.616636 7fc037f3f700 1 -- 10.0.16.101:0/1055174 --> 10.0.16.102:6855/33477 -- osd_op(client.4275.0:31662 default.4270.1_/_distcp_logs_z6cahx [getxattrs,stat,read 0~524288] 11.68ce2873 ack+read e146) v4 -- ?+0 0x7fc0c001c4e0 con 0x7fc0b801d650 -9998> 2014-05-15 17:02:01.619689 7fc04bf67700 1 ====== starting new request req=0x7fc0981b8630 ===== -9997> 2014-05-15 17:02:01.619709 7fc04bf67700 2 req 1134:0.000020::PUT /hadoop-backup-profile/block_-7196675373495747436::initializing -9996> 2014-05-15 17:02:01.619739 7fc04bf67700 2 req 1134:0.000050:s3:PUT /hadoop-backup-profile/block_-7196675373495747436::getting op -9995> 2014-05-15 17:02:01.619721 7fc1134ee700 1 -- 10.0.16.101:0/1055174 <== osd.23 10.0.16.102:6855/33477 2227 ==== osd_op_reply(31662 default.4270.1_/_distcp_logs_z6cahx [getxattrs,stat,read 0~1] v0'0 uv626 ondisk = 0) v6 ==== 286+0+919 (384034339 0 2521691918) 0x7fc0d87c4b00 con 0x7fc0b801d650 -9994> 2014-05-15 17:02:01.619746 7fc04bf67700 2 req 1134:0.000057:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:authorizing -9993> 2014-05-15 17:02:01.619823 7fc037f3f700 2 req 1133:0.003506:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:init op -9992> 2014-05-15 17:02:01.619834 7fc037f3f700 2 req 1133:0.003517:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:verifying op mask -9991> 2014-05-15 17:02:01.619837 7fc037f3f700 2 req 1133:0.003520:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:verifying op permissions -9990> 2014-05-15 17:02:01.619842 7fc037f3f700 5 Searching permissions for uid=hadoop-backup mask=49 -9989> 2014-05-15 17:02:01.619843 7fc037f3f700 5 Found permission: 15 -9988> 2014-05-15 17:02:01.619845 7fc037f3f700 5 Searching permissions for group=1 mask=49 -9987> 2014-05-15 17:02:01.619847 7fc037f3f700 5 Permissions for group not found -9986> 2014-05-15 17:02:01.619848 7fc037f3f700 5 Searching permissions for group=2 mask=49 -9985> 2014-05-15 17:02:01.619850 7fc037f3f700 5 Permissions for group not found -9984> 2014-05-15 17:02:01.619851 7fc037f3f700 5 Getting permissions id=hadoop-backup owner=hadoop-backup perm=1 -9983> 2014-05-15 17:02:01.619852 7fc037f3f700 2 req 1133:0.003535:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:verifying op params -9982> 2014-05-15 17:02:01.619857 7fc037f3f700 2 req 1133:0.003539:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:executing -9981> 2014-05-15 17:02:01.619885 7fc04bf67700 2 req 1134:0.000196:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:reading permissions -9980> 2014-05-15 17:02:01.619918 7fc037f3f700 2 req 1133:0.003600:s3:GET /hadoop-backup-profile/%2F_distcp_logs_z6cahx:get_obj:http status=200 -9979> 2014-05-15 17:02:01.619925 7fc037f3f700 1 ====== req done req=0x7fc0981c6c30 http_status=200 ====== -9978> 2014-05-15 17:02:01.619952 7fc04bf67700 2 req 1134:0.000263:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:init op -9977> 2014-05-15 17:02:01.619959 7fc04bf67700 2 req 1134:0.000270:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:verifying op mask -9976> 2014-05-15 17:02:01.619962 7fc04bf67700 2 req 1134:0.000273:s3:PUT /hadoop-backup-profile/block_-7196675373495747436:put_obj:verifying op permissions -9975> 2014-05-15 17:02:01.619965 7fc04bf67700 5 Searching permissions for uid=hadoop-backup mask=50 ? ..around 9965 limes later ... ? -10> 2014-05-15 17:08:32.289014 7fc00efdd700 5 Searching permissions for group=2 mask=50 -9> 2014-05-15 17:08:32.289017 7fc00efdd700 5 Permissions for group not found -8> 2014-05-15 17:08:32.289018 7fc00efdd700 5 Getting permissions id=hadoop-backup owner=hadoop-backup perm=2 -7> 2014-05-15 17:08:32.289027 7fc00efdd700 2 req 1534:0.001177:s3:PUT /hadoop-backup-profile/_distcp_logs_bedoz8%2Fpart-00008:copy_obj:verifying op params -6> 2014-05-15 17:08:32.289031 7fc00efdd700 2 req 1534:0.001182:s3:PUT /hadoop-backup-profile/_distcp_logs_bedoz8%2Fpart-00008:copy_obj:executing -5> 2014-05-15 17:08:32.289084 7fc00efdd700 5 Copy object hadoop-backup-profile(@{i=.rgw.buckets.index}.rgw.buckets[default.4270.1]):__distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 => hadoop-backup-profile(@{i=.rgw.buckets.index}.rgw .buckets[default.4270.1]):__distcp_logs_bedoz8/part-00008 -4> 2014-05-15 17:08:32.289111 7fc00efdd700 1 -- 10.0.16.101:0/1055174 --> 10.0.16.101:6800/63927 -- osd_op(client.4275.0:33009 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 @11:default.4270.1__distcp_logs_bedoz8/_te mporary/_attempt_201405121058_39180_m_000008_0/part-00008 [getxattrs,stat] 11.9b035c33 ack+read e146) v4 -- ?+0 0x7fc0ec12c590 con 0x7fc0f801ca30 -3> 2014-05-15 17:08:32.289712 7fc1134ee700 1 -- 10.0.16.101:0/1055174 <== osd.0 10.0.16.101:6800/63927 3160 ==== osd_op_reply(33009 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 [getxattrs,stat] v0'0 uv513 ondisk = 0) v6 ==== 305+0+992 (2899522085 0 307831001) 0x7fc0ec020010 con 0x7fc0f801ca30 -2> 2014-05-15 17:08:32.289848 7fc00efdd700 1 -- 10.0.16.101:0/1055174 --> 10.0.16.101:6800/63927 -- osd_op(client.4275.0:33010 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 @11:default.4270.1__distcp_logs_bedoz8/_te mporary/_attempt_201405121058_39180_m_000008_0/part-00008 [cmpxattr user.rgw.idtag (18) op 1 mode 1,read 0~0] 11.9b035c33 ack+read e146) v4 -- ?+0 0x7fc0ec130b00 con 0x7fc0f801ca30 -1> 2014-05-15 17:08:32.290330 7fc1134ee700 1 -- 10.0.16.101:0/1055174 <== osd.0 10.0.16.101:6800/63927 3161 ==== osd_op_reply(33010 default.4270.1___distcp_logs_bedoz8/_temporary/_attempt_201405121058_39180_m_000008_0/part-00008 [cmpxattr (18) op 1 mode 1,read 0~0] v0'0 uv513 ondisk = 1) v6 ==== 305+0+0 (1026422808 0 0) 0x7fc0ec130b00 con 0x7fc0f801ca30 0> 2014-05-15 17:08:32.291511 7fc00efdd700 -1 *** Caught signal (Segmentation fault) ** in thread 7fc00efdd700 ceph version 0.80.1 (a38fe1169b6d2ac98b427334c12d7cf81f809b74) 1: /usr/bin/radosgw() [0x5c4f4a] 2: (()+0xfcb0) [0x7fc11aed1cb0] 3: (()+0x9184e) [0x7fc11a17d84e] 4: (ceph::buffer::ptr::append(char const*, unsigned int)+0x43) [0x7fc11bee4f43] 5: (ceph::buffer::list::append(char const*, unsigned int)+0x91) [0x7fc11bee7681] 6: (RGWRados::copy_obj_data(void*, std::string const&, void**, long, rgw_obj&, rgw_obj&, long*, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*)+0x59e) [0x5247de] 7: (RGWRados::copy_obj(void*, std::string const&, std::string const&, std::string const&, req_info*, std::string const&, rgw_obj&, rgw_obj&, RGWBucketInfo&, RGWBucketInfo&, long*, long const*, long const*, char const*, char const*, bool, std::map<std::string, ceph::buffer::list, std::less<std::string>, std::allocator<std::pair<std::string const, ceph::buffer::list> > >&, RGWObjCategory, std::string*, rgw_err*, void (*)(long, void*), void*)+0x1dc2) [0x532f32] 8: (RGWCopyObj::execute()+0x2bc) [0x555efc] 9: /usr/bin/radosgw() [0x4c7a5c] 10: (RGWFCGXProcess::handle_request(RGWRequest*)+0x9c) [0x4c873c] 11: (RGWProcess::RGWWQ::_process(RGWRequest*)+0x37) [0x4c9827] 12: (ThreadPool::worker(ThreadPool::WorkThread*)+0x4e6) [0x7fc11beccd86] 13: (ThreadPool::WorkThread::entry()+0x10) [0x7fc11beceb90] 14: (()+0x7e9a) [0x7fc11aec9e9a] 15: (clone()+0x6d) [0x7fc11a1e03fd] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 1 lockdep 0/ 1 context 1/ 1 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 1 buffer 0/ 1 timer 0/ 1 filer 0/ 1 striper 0/ 1 objecter 0/ 5 rados 0/ 5 rbd 0/ 5 journaler 0/ 5 objectcacher 0/ 5 client 0/ 5 osd 0/ 5 optracker 0/ 5 objclass 1/ 3 filestore 1/ 3 keyvaluestore 1/ 3 journal 0/ 5 ms 1/ 5 mon 0/10 monc 1/ 5 paxos 0/ 5 tp 1/ 5 auth 1/ 5 crypto 1/ 1 finisher 1/ 5 heartbeatmap 1/ 5 perfcounter 1/ 5 rgw 1/ 5 javaclient 1/ 5 asok 1/ 1 throttle -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/client.radosgw.ceph1.log --- end dump of recent events ? Any ideas/suggestions how to debug/fix this? Thanks a lot, Fabian -------------- next part -------------- A non-text attachment was scrubbed... Name: signature.asc Type: application/pgp-signature Size: 496 bytes Desc: Message signed with OpenPGP using GPGMail URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140515/e968268a/attachment.pgp>