Seems as if this may be fixed in a later kernel, see: http://code.google.com/p/leveldb/issues/detail?id=97 and https://git.kernel.org/?p=linux/kernel/git/josef/btrfs-next.git;a=commit;h=d468abec6b9fd7132d012d33573ecb8056c7c43f Sorry if this email reached anyone more than once - had some trouble with html vs plain text in gmail (ceph-list doesn't allow html email). On Sat, Feb 9, 2013 at 6:21 PM, John Axel Eriksson <john@xxxxxxxxx> wrote: > This sounds very much like what we've been experiencing. Actually, > come to think of it, when I (a month ago or so) enabled more logging, > when one osd crashed, I vaguely remember thinking "it seems to have to > do with leveldb". I guess it can be circumvented by disabling > compression on btrfs (though I don't know for sure). Thing is - that's > the reason we chose btrfs in the first place... the savings are huge > for us - I think we only need around 20% of the storage we'd otherwise > need - with compression enabled. Depends on the data you store and we > store stuff that compresses really really well. > > Guess I'll have to keep looking for answers, maybe someone else on the > list knows more? > > Thanks! > > On Sat, Feb 9, 2013 at 5:41 PM, Gregory Farnum <greg@xxxxxxxxxxx> wrote: >> On Saturday, February 9, 2013 at 6:23 AM, John Axel Eriksson wrote: >>> Three times now, twice on one osd, once on another we've had the osd >>> crash. Restarting it wouldn't help - it would crash with the same >>> error. The only way I found to get it up again was to reformat both >>> the journal disk and the disk ceph is using for storage... basically >>> recreating the osd. >>> This has got me thinking it's some sort of filesystem corruption going >>> on but I can't be sure. >>> >>> Thing is, the first two times this happended on 0.48.3 (argonaut) and >>> this last time it happened on 0.56.2 - I upgraded hoping this issue >>> was fixed. >>> >>> There is another possibility than ceph itself - we're using btrfs on >>> the ceph disks. We're using it because in general we haven't seen any >>> problems. We've been running ceph on these for six months without >>> issue. We also really need the compression btrfs can do (we're saving >>> vast amounts of space this way because of the nature of the data we're >>> storing). >>> >>> Kernel is, and has been 3.6.2-030602-generic for a long time now, I >>> think we started out on 3.5.x but pretty quickly went to 3.6.2. The >>> disks are formatted like so: >>> mkfs.btrfs -l 32k -n 32k /dev/xvdf >>> >>> Otherwise the nodes are running on Ubuntu 12.04.1 LTS. This is all >>> running on EC2. Thanks for any help I can get! >>> >>> I know it may not be verbose enough but this is the log I got from >>> this last crash: >> >> This log indicates the problem is a corruption in the integrated leveldb database. And you mention using btrfs compression, so I point you to http://tracker.ceph.com/issues/2563. :( I don't know anything more than that; maybe somebody else on the team knows more…Sam? >> -Greg >> >> >>> >>> 2013-02-09 13:18:08.685989 7f3f92949780 1 journal _open >>> /mnt/osd.2.journal fd 7: 1048576000 bytes, block size 4096 bytes, >>> directio = 1, aio = 0 >>> 2013-02-09 13:18:08.693418 7f3f92949780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mkjournal created journal on >>> /mnt/osd.2.journal >>> 2013-02-09 13:18:08.693481 7f3f92949780 -1 created new journal >>> /mnt/osd.2.journal for object store /var/lib/ceph/osd/ceph-2 >>> 2013-02-09 13:18:21.926143 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount FIEMAP ioctl is supported >>> and appears to work >>> 2013-02-09 13:18:21.926214 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount FIEMAP ioctl is disabled via >>> 'filestore fiemap' config option >>> 2013-02-09 13:18:21.926704 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount detected btrfs >>> 2013-02-09 13:18:21.926881 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs CLONE_RANGE ioctl is >>> supported >>> 2013-02-09 13:18:21.996613 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_CREATE is >>> supported >>> 2013-02-09 13:18:21.998330 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_DESTROY is >>> supported >>> 2013-02-09 13:18:21.999840 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs START_SYNC is >>> supported (transid 549552) >>> 2013-02-09 13:18:22.032267 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs WAIT_SYNC is supported >>> 2013-02-09 13:18:22.045994 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_CREATE_V2 is >>> supported >>> 2013-02-09 13:18:22.104523 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount syncfs(2) syscall fully >>> supported (by glibc and kernel) >>> 2013-02-09 13:18:22.104811 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount found snaps >>> <4282852,4282856> >>> 2013-02-09 13:18:22.323175 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling PARALLEL journal >>> mode: btrfs, SNAP_CREATE_V2 detected and 'filestore btrfs snap' mode >>> is enabled >>> 2013-02-09 13:18:23.041769 7f09b4dc7700 -1 *** Caught signal (Aborted) ** >>> in thread 7f09b4dc7700 >>> >>> ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) >>> 1: /usr/bin/ceph-osd() [0x7828da] >>> 2: (()+0xfcb0) [0x7f09b8bc8cb0] >>> 3: (gsignal()+0x35) [0x7f09b7587425] >>> 4: (abort()+0x17b) [0x7f09b758ab8b] >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f09b7ed969d] >>> 6: (()+0xb5846) [0x7f09b7ed7846] >>> 7: (()+0xb5873) [0x7f09b7ed7873] >>> 8: (()+0xb596e) [0x7f09b7ed796e] >>> 9: (std::__throw_length_error(char const*)+0x57) [0x7f09b7e84907] >>> 10: (()+0x9eaa2) [0x7f09b7ec0aa2] >>> 11: (char* std::string::_S_construct<char const*>(char const*, char >>> const*, std::allocator<char> const&, std::forward_iterator_tag)+0x35) >>> [0x7f09b7ec2495] >>> 12: (std::basic_string<char, std::char_traits<char>, >>> std::allocator<char> >::basic_string(char const*, unsigned long, >>> std::allocator<char> const&)+0x1d) [0x7f09b7ec261d] >>> 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, >>> leveldb::Slice const&) const+0x47) [0x769137] >>> 14: (leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice >>> const&)+0x92) [0x777b62] >>> 15: (leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0x482) >>> [0x7639a2] >>> 16: (leveldb::DBImpl::BackgroundCompaction()+0x2b0) [0x7641a0] >>> 17: (leveldb::DBImpl::BackgroundCall()+0x68) [0x764c48] >>> 18: /usr/bin/ceph-osd() [0x77dbef] >>> 19: (()+0x7e9a) [0x7f09b8bc0e9a] >>> 20: (clone()+0x6d) [0x7f09b7644cbd] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- begin dump of recent events --- >>> -35> 2013-02-09 13:18:21.898622 7f09b972d780 5 asok(0x1c4d000) >>> register_command perfcounters_dump hook 0x1c42010 >>> -34> 2013-02-09 13:18:21.898746 7f09b972d780 5 asok(0x1c4d000) >>> register_command 1 hook 0x1c42010 >>> -33> 2013-02-09 13:18:21.898765 7f09b972d780 5 asok(0x1c4d000) >>> register_command perf dump hook 0x1c42010 >>> -32> 2013-02-09 13:18:21.898789 7f09b972d780 5 asok(0x1c4d000) >>> register_command perfcounters_schema hook 0x1c42010 >>> -31> 2013-02-09 13:18:21.898799 7f09b972d780 5 asok(0x1c4d000) >>> register_command 2 hook 0x1c42010 >>> -30> 2013-02-09 13:18:21.898807 7f09b972d780 5 asok(0x1c4d000) >>> register_command perf schema hook 0x1c42010 >>> -29> 2013-02-09 13:18:21.898812 7f09b972d780 5 asok(0x1c4d000) >>> register_command config show hook 0x1c42010 >>> -28> 2013-02-09 13:18:21.898820 7f09b972d780 5 asok(0x1c4d000) >>> register_command config set hook 0x1c42010 >>> -27> 2013-02-09 13:18:21.898824 7f09b972d780 5 asok(0x1c4d000) >>> register_command log flush hook 0x1c42010 >>> -26> 2013-02-09 13:18:21.898826 7f09b972d780 5 asok(0x1c4d000) >>> register_command log dump hook 0x1c42010 >>> -25> 2013-02-09 13:18:21.898833 7f09b972d780 5 asok(0x1c4d000) >>> register_command log reopen hook 0x1c42010 >>> -24> 2013-02-09 13:18:21.900486 7f09b972d780 0 ceph version 0.56.2 >>> (586538e22afba85c59beda49789ec42024e7a061), process ceph-osd, pid 3948 >>> -23> 2013-02-09 13:18:21.901111 7f09b972d780 1 >>> accepter.accepter.bind my_inst.addr is 0.0.0.0:6800/3948 need_addr=1 >>> -22> 2013-02-09 13:18:21.901159 7f09b972d780 1 >>> accepter.accepter.bind my_inst.addr is 0.0.0.0:6801/3948 need_addr=1 >>> -21> 2013-02-09 13:18:21.901179 7f09b972d780 1 >>> accepter.accepter.bind my_inst.addr is 0.0.0.0:6802/3948 need_addr=1 >>> -20> 2013-02-09 13:18:21.902977 7f09b972d780 1 finished >>> global_init_daemonize >>> -19> 2013-02-09 13:18:21.907341 7f09b972d780 5 asok(0x1c4d000) >>> init /var/run/ceph/ceph-osd.2.asok >>> -18> 2013-02-09 13:18:21.907404 7f09b972d780 5 asok(0x1c4d000) >>> bind_and_listen /var/run/ceph/ceph-osd.2.asok >>> -17> 2013-02-09 13:18:21.907470 7f09b972d780 5 asok(0x1c4d000) >>> register_command 0 hook 0x1c410b0 >>> -16> 2013-02-09 13:18:21.907487 7f09b972d780 5 asok(0x1c4d000) >>> register_command version hook 0x1c410b0 >>> -15> 2013-02-09 13:18:21.907499 7f09b972d780 5 asok(0x1c4d000) >>> register_command git_version hook 0x1c410b0 >>> -14> 2013-02-09 13:18:21.907508 7f09b972d780 5 asok(0x1c4d000) >>> register_command help hook 0x1c420c0 >>> -13> 2013-02-09 13:18:21.907581 7f09b55c8700 5 asok(0x1c4d000) entry start >>> -12> 2013-02-09 13:18:21.926143 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount FIEMAP ioctl is supported >>> and appears to work >>> -11> 2013-02-09 13:18:21.926214 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount FIEMAP ioctl is disabled via >>> 'filestore fiemap' config option >>> -10> 2013-02-09 13:18:21.926704 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount detected btrfs >>> -9> 2013-02-09 13:18:21.926881 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs CLONE_RANGE ioctl is >>> supported >>> -8> 2013-02-09 13:18:21.996613 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_CREATE is >>> supported >>> -7> 2013-02-09 13:18:21.998330 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_DESTROY is >>> supported >>> -6> 2013-02-09 13:18:21.999840 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs START_SYNC is >>> supported (transid 549552) >>> -5> 2013-02-09 13:18:22.032267 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs WAIT_SYNC is supported >>> -4> 2013-02-09 13:18:22.045994 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount btrfs SNAP_CREATE_V2 is >>> supported >>> -3> 2013-02-09 13:18:22.104523 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount syncfs(2) syscall fully >>> supported (by glibc and kernel) >>> -2> 2013-02-09 13:18:22.104811 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount found snaps >>> <4282852,4282856> >>> -1> 2013-02-09 13:18:22.323175 7f09b972d780 0 >>> filestore(/var/lib/ceph/osd/ceph-2) mount: enabling PARALLEL journal >>> mode: btrfs, SNAP_CREATE_V2 detected and 'filestore btrfs snap' mode >>> is enabled >>> 0> 2013-02-09 13:18:23.041769 7f09b4dc7700 -1 *** Caught signal >>> (Aborted) ** >>> in thread 7f09b4dc7700 >>> >>> ceph version 0.56.2 (586538e22afba85c59beda49789ec42024e7a061) >>> 1: /usr/bin/ceph-osd() [0x7828da] >>> 2: (()+0xfcb0) [0x7f09b8bc8cb0] >>> 3: (gsignal()+0x35) [0x7f09b7587425] >>> 4: (abort()+0x17b) [0x7f09b758ab8b] >>> 5: (__gnu_cxx::__verbose_terminate_handler()+0x11d) [0x7f09b7ed969d] >>> 6: (()+0xb5846) [0x7f09b7ed7846] >>> 7: (()+0xb5873) [0x7f09b7ed7873] >>> 8: (()+0xb596e) [0x7f09b7ed796e] >>> 9: (std::__throw_length_error(char const*)+0x57) [0x7f09b7e84907] >>> 10: (()+0x9eaa2) [0x7f09b7ec0aa2] >>> 11: (char* std::string::_S_construct<char const*>(char const*, char >>> const*, std::allocator<char> const&, std::forward_iterator_tag)+0x35) >>> [0x7f09b7ec2495] >>> 12: (std::basic_string<char, std::char_traits<char>, >>> std::allocator<char> >::basic_string(char const*, unsigned long, >>> std::allocator<char> const&)+0x1d) [0x7f09b7ec261d] >>> 13: (leveldb::InternalKeyComparator::FindShortestSeparator(std::string*, >>> leveldb::Slice const&) const+0x47) [0x769137] >>> 14: (leveldb::TableBuilder::Add(leveldb::Slice const&, leveldb::Slice >>> const&)+0x92) [0x777b62] >>> 15: (leveldb::DBImpl::DoCompactionWork(leveldb::DBImpl::CompactionState*)+0x482) >>> [0x7639a2] >>> 16: (leveldb::DBImpl::BackgroundCompaction()+0x2b0) [0x7641a0] >>> 17: (leveldb::DBImpl::BackgroundCall()+0x68) [0x764c48] >>> 18: /usr/bin/ceph-osd() [0x77dbef] >>> 19: (()+0x7e9a) [0x7f09b8bc0e9a] >>> 20: (clone()+0x6d) [0x7f09b7644cbd] >>> NOTE: a copy of the executable, or `objdump -rdS <executable>` is >>> needed to interpret this. >>> >>> --- logging levels --- >>> 0/ 5 none >>> 0/ 1 lockdep >>> 0/ 1 context >>> 1/ 1 crush >>> 1/ 5 mds >>> 1/ 5 mds_balancer >>> 1/ 5 mds_locker >>> 1/ 5 mds_log >>> 1/ 5 mds_log_expire >>> 1/ 5 mds_migrator >>> 0/ 1 buffer >>> 0/ 1 timer >>> 0/ 1 filer >>> 0/ 1 striper >>> 0/ 1 objecter >>> 0/ 5 rados >>> 0/ 5 rbd >>> 0/ 5 journaler >>> 0/ 5 objectcacher >>> 0/ 5 client >>> 0/ 5 osd >>> 0/ 5 optracker >>> 0/ 5 objclass >>> 1/ 3 filestore >>> 1/ 3 journal >>> 0/ 5 ms >>> 1/ 5 mon >>> 0/10 monc >>> 0/ 5 paxos >>> 0/ 5 tp >>> 1/ 5 auth >>> 1/ 5 crypto >>> 1/ 1 finisher >>> 1/ 5 heartbeatmap >>> 1/ 5 perfcounter >>> 1/ 5 rgw >>> 1/ 5 hadoop >>> 1/ 5 javaclient >>> 1/ 5 asok >>> 1/ 1 throttle >>> -2/-2 (syslog threshold) >>> -1/-1 (stderr threshold) >>> max_recent 100000 >>> max_new 1000 >>> log_file /var/log/ceph/ceph-osd.2.log >>> --- end dump of recent events --- >>> -- >>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>> the body of a message to majordomo@xxxxxxxxxxxxxxx (mailto:majordomo@xxxxxxxxxxxxxxx) >>> More majordomo info at http://vger.kernel.org/majordomo-info.html >> >> >> -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html