Re: Multi-device BlueStore OSDs multiple fsck failures

"Stillwell, Bryan J" <Bryan.Stillwell@xxxxxxxxxxx> · Wed, 3 Aug 2016 23:19:04 +0000

Thanks Somnath,

I'll try moving my testing to master tomorrow to see if that improves the
stability at all.

Bryan

On 8/3/16, 4:50 PM, "Somnath Roy" <Somnath.Roy@xxxxxxxxxxx> wrote:

>Probably, it is better to move to latest master and reproduce this
>defect. Lot of stuff has changed between this.
>This is a good test case and I doubt any of us testing by enabling fsck()
>on mount/unmount.
>
>Thanks & Regards
>Somnath
>
>-----Original Message-----
>From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of
>Stillwell, Bryan J
>Sent: Wednesday, August 03, 2016 3:41 PM
>To: ceph-users@xxxxxxxxxxxxxx
>Subject:  Multi-device BlueStore OSDs multiple fsck failures
>
>I've been doing some benchmarking of BlueStore in 10.2.2 the last few
>days and have come across a failure that keeps happening after stressing
>the cluster fairly heavily.  Some of the OSDs started failing and
>attempts to restart them fail to log anything in /var/log/ceph/, so I
>tried starting them manually and ran into these error messages:
>
># /usr/bin/ceph-osd --cluster=ceph -i 4 -f --setuser ceph --setgroup ceph
>2016-08-02 22:52:01.190226 7f97d75e1800 -1 WARNING: the following
>dangerous and experimental features are enabled: *
>2016-08-02 22:52:01.190340 7f97d75e1800 -1 WARNING: the following
>dangerous and experimental features are enabled: *
>2016-08-02 22:52:01.190497 7f97d75e1800 -1 WARNING: experimental feature
>'bluestore' is enabled Please be aware that this feature is experimental,
>untested, unsupported, and may result in data corruption, data loss,
>and/or irreparable damage to your cluster.  Do not use feature with
>important data.
>
>starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4/
>/var/lib/ceph/osd/ceph-4/journal
>2016-08-02 22:52:01.194461 7f97d75e1800 -1 WARNING: the following
>dangerous and experimental features are enabled: *
>2016-08-02 22:52:01.237619 7f97d75e1800 -1 WARNING: experimental feature
>'rocksdb' is enabled Please be aware that this feature is experimental,
>untested, unsupported, and may result in data corruption, data loss,
>and/or irreparable damage to your cluster.  Do not use feature with
>important data.
>
>2016-08-02 22:52:01.501405 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/)  a#20:bac03f87:::4_454:head# nid
>67134 already in use
>2016-08-02 22:52:01.629900 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/)  9#20:e64f44a7:::4_258:head# nid
>78351 already in use
>2016-08-02 22:52:01.967599 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/) fsck free extent
>256983760896~1245184 intersects allocated blocks
>2016-08-02 22:52:01.967605 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/) fsck overlap: [256984940544~65536]
>2016-08-02 22:52:01.978635 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/) fsck free extent 258455044096~196608
>intersects allocated blocks
>2016-08-02 22:52:01.978640 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/) fsck overlap: [258455175168~65536]
>2016-08-02 22:52:01.978647 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/) fsck leaked some space; free+used =
>[0~252138684416,252138815488~4844945408,256984940544~1470103552,2584551751
>6
>8~5732719067136] != expected 0~5991174242304
>2016-08-02 22:52:02.987479 7f97d75e1800 -1
>bluestore(/var/lib/ceph/osd/ceph-4/) mount fsck found 5 errors
>2016-08-02 22:52:02.987488 7f97d75e1800 -1 osd.4 0 OSD:init: unable to
>mount object store
>2016-08-02 22:52:02.987498 7f97d75e1800 -1  ** ERROR: osd init failed:
>(5) Input/output error
>
>
>Here's another one:
>
># /usr/bin/ceph-osd --cluster=ceph -i 11 -f --setuser ceph --setgroup ceph
>2016-08-03 22:16:49.052319 7f0e4d949800 -1 WARNING: the following
>dangerous and experimental features are enabled: *
>2016-08-03 22:16:49.052445 7f0e4d949800 -1 WARNING: the following
>dangerous and experimental features are enabled: *
>2016-08-03 22:16:49.052690 7f0e4d949800 -1 WARNING: experimental feature
>'bluestore' is enabled Please be aware that this feature is experimental,
>untested, unsupported, and may result in data corruption, data loss,
>and/or irreparable damage to your cluster.  Do not use feature with
>important data.
>
>starting osd.11 at :/0 osd_data /var/lib/ceph/osd/ceph-11/
>/var/lib/ceph/osd/ceph-11/journal
>2016-08-03 22:16:49.056779 7f0e4d949800 -1 WARNING: the following
>dangerous and experimental features are enabled: *
>2016-08-03 22:16:49.095695 7f0e4d949800 -1 WARNING: experimental feature
>'rocksdb' is enabled Please be aware that this feature is experimental,
>untested, unsupported, and may result in data corruption, data loss,
>and/or irreparable damage to your cluster.  Do not use feature with
>important data.
>
>2016-08-03 22:16:49.821451 7f0e4d949800 -1
>bluestore(/var/lib/ceph/osd/ceph-11/)  6#20:2eed99bf:::4_257:head# nid
>72869 already in use
>2016-08-03 22:16:49.961943 7f0e4d949800 -1
>bluestore(/var/lib/ceph/osd/ceph-11/) fsck free extent 257123155968~65536
>intersects allocated blocks
>2016-08-03 22:16:49.961950 7f0e4d949800 -1
>bluestore(/var/lib/ceph/osd/ceph-11/) fsck overlap: [257123155968~65536]
>2016-08-03 22:16:49.962012 7f0e4d949800 -1
>bluestore(/var/lib/ceph/osd/ceph-11/) fsck leaked some space; free+used =
>[0~241963433984,241963499520~5749210742784] != expected 0~5991174242304
>2016-08-03 22:16:50.855099 7f0e4d949800 -1
>bluestore(/var/lib/ceph/osd/ceph-11/) mount fsck found 3 errors
>2016-08-03 22:16:50.855109 7f0e4d949800 -1 osd.11 0 OSD:init: unable to
>mount object store
>2016-08-03 22:16:50.855118 7f0e4d949800 -1  ** ERROR: osd init failed:
>(5) Input/output error
>
>
>I currently have a total of 12 OSDs down (out of 46) which all appear to
>be experiencing this problem.
>
>Here are more details of the cluster (currently just a single node):
>
>2x Xeon E5-2699 v4 @ 2.20GHz
>128GiB memory
>2x LSI Logic SAS3008 HBAs
>3x Intel DC P3700 NVMe cards
>48x 6TB HDDs
>OS: Ubuntu 14.04.4 w/ Xenial HWE kernel (4.4.0-29-generic)
>
>I've split it up so that each NVMe card handles the BlueStore wal and db
>partitions of 16 OSDs.
>
>The testing has been done with 'rados bench' and 'cosbench' using a 10+3
>erasure coding config.  Overall the performance is quite good (I'm seeing
>about
>74.6 MB/s per disk), but these failures halt my testing each time I run
>into them and then I have to rebuild the cluster to continue testing.
>
>Let me know if there's any additional information you guys would like me
>to gather.
>
>Thanks,
>Bryan
>
>_______________________________________________
>ceph-users mailing list
>ceph-users@xxxxxxxxxxxxxx
>http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>PLEASE NOTE: The information contained in this electronic mail message is
>intended only for the use of the designated recipient(s) named above. If
>the reader of this message is not the intended recipient, you are hereby
>notified that you have received this message in error and that any
>review, dissemination, distribution, or copying of this message is
>strictly prohibited. If you have received this communication in error,
>please notify the sender by telephone or e-mail (as shown above)
>immediately and destroy any and all copies of this message in your
>possession (whether hard copies or electronically stored copies).

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com