cluster [ERR] osd.NN: inconsistent clone_overlap found for oid xxxxxxxx/rbd_data and OSD crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello,

We recently had a outage on our Ceph storage cluster caused by what I believe to be a bug in Ceph. At the time of the incident all MONs, OSDs and clients (except for one) were running Ceph Hammer 0.94.6.

To start describing the incident I will portray a hierarchy of rbd volumes/snaptshos/clones:
root: volumes/volume-35429025-844f-42bb-8fb1-7071cabf0e9a
child:  volumes/volume-35429025-844f-42bb-8fb1-7071cabf0e9a@snapshot-37cfc5c6-312c-45f1-a586-0398b04a2e35
child:    volumes/volume-02557bbd-d513-4c7c-9035-73dac6704a93

This is a long-lived cluster so I am not sure under what version the above construction of volumes was created. I guess we started out with Firefly at some point back in time.

At approximately 09:22 a request was issued to delete snapshot-37cfc5c6-312c-45f1-a586-0398b04a2e35.

Some time after this there was also issued a «rbd -p volumes disk-usage» command from a 10.2.0 client. I don’t know if it is relevant but thought it was worth noting.

A short while after these messages occured in the logs:
2016-05-16 09:27:47.858897 osd.38 [xxx]:6805/2576 995 : cluster [ERR] osd.38: inconsistent clone_overlap found for oid 46b786d5/rbd_data.26b3cb5fbccda4.0000000000000185/head//29 clone 0
2016-05-16 09:27:46.011693 osd.48 [xxx]:6801/2362 697 : cluster [ERR] osd.48: inconsistent clone_overlap found for oid 9e88fdc5/rbd_data.26b3cb5fbccda4.0000000000000009/head//29 clone 0
2016-05-16 09:27:48.280009 osd.48 [xxx]:6801/2362 698 : cluster [ERR] osd.48: inconsistent clone_overlap found for oid 81532b89/rbd_data.26b3cb5fbccda4.00000000000001ce/head//29 clone 0
2016-05-16 09:27:49.706609 osd.44 [xxx]:6801/2434 613 : cluster [ERR] osd.44: inconsistent clone_overlap found for oid bddaa5a1/rbd_data.26b3cb5fbccda4.0000000000000304/head//29 clone 0
2016-05-16 09:27:48.999400 osd.37 [xxx]:6803/2538 902 : cluster [ERR] osd.37: inconsistent clone_overlap found for oid ca18f5ca/rbd_data.26b3cb5fbccda4.000000000000026b/head//29 clone 0
2016-05-16 09:27:49.014679 osd.36 [xxx]:6807/2598 1015 : cluster [ERR] osd.36: inconsistent clone_overlap found for oid eca5f5dc/rbd_data.26b3cb5fbccda4.000000000000026c/head//29 clone 0
2016-05-16 09:27:49.235251 osd.36 [xxx]:6807/2598 1016 : cluster [ERR] osd.36: inconsistent clone_overlap found for oid 9fb5b7dc/rbd_data.26b3cb5fbccda4.00000000000002a1/head//29 clone 0
2016-05-16 09:27:49.526915 osd.50 [xxx]:6805/2410 693 : cluster [ERR] osd.50: inconsistent clone_overlap found for oid ab5eff09/rbd_data.26b3cb5fbccda4.00000000000002c6/head//29 clone 0
2016-05-16 09:27:50.336825 osd.36 [xxx]:6807/2598 1017 : cluster [ERR] osd.36: inconsistent clone_overlap found for oid d7922b2b/rbd_data.26b3cb5fbccda4.0000000000000392/head//29 clone 0
2016-05-16 09:27:50.037706 osd.38 [xxx]:6805/2576 996 : cluster [ERR] osd.38: inconsistent clone_overlap found for oid d70156d5/rbd_data.26b3cb5fbccda4.000000000000034f/head//29 clone 0
2016-05-16 09:27:51.875372 osd.44 [xxx]:6801/2434 614 : cluster [ERR] osd.44: inconsistent clone_overlap found for oid e839050d/rbd_data.26b3cb5fbccda4.00000000000004f0/head//29 clone 0

At around 10:00 OSDs start to crash. After a long time of debugging we figured out that the OSD crashes was tied to write operations related to the volume at the bottom of the ancestry outlined above.

The tricky thing about this is that during the time we spent figuring this out the OSDs crashing caused a ripple effect trhoughout the cluster. When the first OSD eventually was marked down+out the next one to handle the requests would start crashing and so on and so forth. This had a devestating effect on our cluster, and we were effectively more ore less down for 12 - 17 hours while trying to figure out the root cause of the problem.

During the search for the root cause we first upgraded the cluster to Hammer 0.94.7 and then to Jewel 10.2.1. So all these versions are still affected by the issue.

There are seperate problems here:
1) Why was the attempt to delete the snapshot successful(?) / why did the attempt at deleting the snapshot cause the error messages.
2) Would it be usefull to attempt to detect requests that crash the OSDs and at some point deny them?
3) We still have the «affected» volume in our cluster and cannot do anything with it. Attempts to delete or otherwise modify it still causes OSDs to crash, and we need some way of removing it.

Crashdump caught with gdb from one of the OSDs:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x7fc2e4f9f700 (LWP 10954)]
0x00007fc30d45aaaa in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
(gdb) bt
#0  0x00007fc30d45aaaa in ?? () from /usr/lib/x86_64-linux-gnu/libstdc++.so.6
#1  0x0000000000889e99 in operator-- (this=<synthetic pointer>) at /usr/include/c++/4.8/bits/stl_tree.h:204
#2  operator* (this=<optimized out>) at /usr/include/c++/4.8/bits/stl_iterator.h:163
#3  operator-> (this=<optimized out>) at /usr/include/c++/4.8/bits/stl_iterator.h:173
#4  ReplicatedPG::make_writeable (this=this@entry=0x1926a400, ctx=ctx@entry=0x40c7800) at osd/ReplicatedPG.cc:5508
#5  0x000000000088b973 in ReplicatedPG::prepare_transaction (this=this@entry=0x1926a400, ctx=ctx@entry=0x40c7800) at osd/ReplicatedPG.cc:5703
#6  0x000000000088bfba in ReplicatedPG::execute_ctx (this=this@entry=0x1926a400, ctx=ctx@entry=0x40c7800) at osd/ReplicatedPG.cc:2269
#7  0x0000000000891b99 in ReplicatedPG::do_op (this=0x1926a400, op=std::tr1::shared_ptr (count 4, weak 0) 0x2762af00) at osd/ReplicatedPG.cc:1752
#8  0x000000000082b30a in ReplicatedPG::do_request (this=0x1926a400, op=std::tr1::shared_ptr (count 4, weak 0) 0x2762af00, handle=...)
    at osd/ReplicatedPG.cc:1296
#9  0x0000000000697e35 in OSD::dequeue_op (this=0x41ee000, pg=..., op=std::tr1::shared_ptr (count 4, weak 0) 0x2762af00, handle=...) at osd/OSD.cc:8503
#10 0x0000000000698388 in OSD::ShardedOpWQ::_process (this=0x41ef028, thread_index=<optimized out>, hb=<optimized out>) at osd/OSD.cc:8397
#11 0x0000000000ba1975 in ShardedThreadPool::shardedthreadpool_worker (this=0x41ee640, thread_index=<optimized out>) at common/WorkQueue.cc:340
#12 0x0000000000ba3a90 in ShardedThreadPool::WorkThreadSharded::entry (this=<optimized out>) at common/WorkQueue.h:627
#13 0x00007fc30e694182 in start_thread (arg=0x7fc2e4f9f700) at pthread_create.c:312
#14 0x00007fc30cbff47d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:111
(gdb) frame 8
#8  0x000000000082b30a in ReplicatedPG::do_request (this=0x1926a400, op=std::tr1::shared_ptr (count 4, weak 0) 0x2762af00, handle=...)
    at osd/ReplicatedPG.cc:1296
1296	osd/ReplicatedPG.cc: No such file or directory.
(gdb) print *this

You can find the output of print *this here: https://gist.github.com/fnordahl/e29321a1b027a89b56080540348532d7


PS:
I attempted to create a tracker issue for this, but my logon currently does not work and the password recovery procedure also seems to be broken.

--
Frode Nordahl
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux