Re: Multi-device BlueStore OSDs multiple fsck failures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Probably, it is better to move to latest master and reproduce this defect. Lot of stuff has changed between this.
This is a good test case and I doubt any of us testing by enabling fsck() on mount/unmount.

Thanks & Regards
Somnath

-----Original Message-----
From: ceph-users [mailto:ceph-users-bounces@xxxxxxxxxxxxxx] On Behalf Of Stillwell, Bryan J
Sent: Wednesday, August 03, 2016 3:41 PM
To: ceph-users@xxxxxxxxxxxxxx
Subject:  Multi-device BlueStore OSDs multiple fsck failures

I've been doing some benchmarking of BlueStore in 10.2.2 the last few days and have come across a failure that keeps happening after stressing the cluster fairly heavily.  Some of the OSDs started failing and attempts to restart them fail to log anything in /var/log/ceph/, so I tried starting them manually and ran into these error messages:

# /usr/bin/ceph-osd --cluster=ceph -i 4 -f --setuser ceph --setgroup ceph
2016-08-02 22:52:01.190226 7f97d75e1800 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-02 22:52:01.190340 7f97d75e1800 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-02 22:52:01.190497 7f97d75e1800 -1 WARNING: experimental feature 'bluestore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster.  Do not use feature with important data.

starting osd.4 at :/0 osd_data /var/lib/ceph/osd/ceph-4/ /var/lib/ceph/osd/ceph-4/journal
2016-08-02 22:52:01.194461 7f97d75e1800 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-02 22:52:01.237619 7f97d75e1800 -1 WARNING: experimental feature 'rocksdb' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster.  Do not use feature with important data.

2016-08-02 22:52:01.501405 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/)  a#20:bac03f87:::4_454:head# nid
67134 already in use
2016-08-02 22:52:01.629900 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/)  9#20:e64f44a7:::4_258:head# nid
78351 already in use
2016-08-02 22:52:01.967599 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/) fsck free extent 256983760896~1245184 intersects allocated blocks
2016-08-02 22:52:01.967605 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/) fsck overlap: [256984940544~65536]
2016-08-02 22:52:01.978635 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/) fsck free extent 258455044096~196608 intersects allocated blocks
2016-08-02 22:52:01.978640 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/) fsck overlap: [258455175168~65536]
2016-08-02 22:52:01.978647 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/) fsck leaked some space; free+used =
[0~252138684416,252138815488~4844945408,256984940544~1470103552,25845517516
8~5732719067136] != expected 0~5991174242304
2016-08-02 22:52:02.987479 7f97d75e1800 -1
bluestore(/var/lib/ceph/osd/ceph-4/) mount fsck found 5 errors
2016-08-02 22:52:02.987488 7f97d75e1800 -1 osd.4 0 OSD:init: unable to mount object store
2016-08-02 22:52:02.987498 7f97d75e1800 -1  ** ERROR: osd init failed: (5) Input/output error


Here's another one:

# /usr/bin/ceph-osd --cluster=ceph -i 11 -f --setuser ceph --setgroup ceph
2016-08-03 22:16:49.052319 7f0e4d949800 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-03 22:16:49.052445 7f0e4d949800 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-03 22:16:49.052690 7f0e4d949800 -1 WARNING: experimental feature 'bluestore' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster.  Do not use feature with important data.

starting osd.11 at :/0 osd_data /var/lib/ceph/osd/ceph-11/ /var/lib/ceph/osd/ceph-11/journal
2016-08-03 22:16:49.056779 7f0e4d949800 -1 WARNING: the following dangerous and experimental features are enabled: *
2016-08-03 22:16:49.095695 7f0e4d949800 -1 WARNING: experimental feature 'rocksdb' is enabled Please be aware that this feature is experimental, untested, unsupported, and may result in data corruption, data loss, and/or irreparable damage to your cluster.  Do not use feature with important data.

2016-08-03 22:16:49.821451 7f0e4d949800 -1
bluestore(/var/lib/ceph/osd/ceph-11/)  6#20:2eed99bf:::4_257:head# nid
72869 already in use
2016-08-03 22:16:49.961943 7f0e4d949800 -1
bluestore(/var/lib/ceph/osd/ceph-11/) fsck free extent 257123155968~65536 intersects allocated blocks
2016-08-03 22:16:49.961950 7f0e4d949800 -1
bluestore(/var/lib/ceph/osd/ceph-11/) fsck overlap: [257123155968~65536]
2016-08-03 22:16:49.962012 7f0e4d949800 -1
bluestore(/var/lib/ceph/osd/ceph-11/) fsck leaked some space; free+used = [0~241963433984,241963499520~5749210742784] != expected 0~5991174242304
2016-08-03 22:16:50.855099 7f0e4d949800 -1
bluestore(/var/lib/ceph/osd/ceph-11/) mount fsck found 3 errors
2016-08-03 22:16:50.855109 7f0e4d949800 -1 osd.11 0 OSD:init: unable to mount object store
2016-08-03 22:16:50.855118 7f0e4d949800 -1  ** ERROR: osd init failed: (5) Input/output error


I currently have a total of 12 OSDs down (out of 46) which all appear to be experiencing this problem.

Here are more details of the cluster (currently just a single node):

2x Xeon E5-2699 v4 @ 2.20GHz
128GiB memory
2x LSI Logic SAS3008 HBAs
3x Intel DC P3700 NVMe cards
48x 6TB HDDs
OS: Ubuntu 14.04.4 w/ Xenial HWE kernel (4.4.0-29-generic)

I've split it up so that each NVMe card handles the BlueStore wal and db partitions of 16 OSDs.

The testing has been done with 'rados bench' and 'cosbench' using a 10+3 erasure coding config.  Overall the performance is quite good (I'm seeing about
74.6 MB/s per disk), but these failures halt my testing each time I run into them and then I have to rebuild the cluster to continue testing.

Let me know if there's any additional information you guys would like me to gather.

Thanks,
Bryan

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
PLEASE NOTE: The information contained in this electronic mail message is intended only for the use of the designated recipient(s) named above. If the reader of this message is not the intended recipient, you are hereby notified that you have received this message in error and that any review, dissemination, distribution, or copying of this message is strictly prohibited. If you have received this communication in error, please notify the sender by telephone or e-mail (as shown above) immediately and destroy any and all copies of this message in your possession (whether hard copies or electronically stored copies).
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux