Re: OSDs with btrfs are down

Jiri Kanicky <j@xxxxxxxxxx> · Mon, 05 Jan 2015 02:25:19 +1100

Hi.

I have been experiencing same issues on both nodes over the past 2 days 
(never both nodes at the same time).  It seems the issue occurs after 
some time when copying  a large number of files to CephFS on my client 
node (I dont use RBD yet).

These are new HP servers and the memory does not seem to have any issues 
in mem test. I use SSD for OS and normal drives for OSD. I think that 
the issue is not related to drives as it would be too much coincident to 
have 6 drives with bad blocks on both nodes.

I will also disable the snapshots and report back after few days.

Thx Jiri

On 5/01/2015 01:33, Dyweni - Ceph-Users wrote:

On 2015-01-04 08:21, Jiri Kanicky wrote:

More googling took me to the following post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040279.html 

Linux 3.14.1 is affected by serious Btrfs regression(s) that were 
fixed in
later releases.

Unfortunately even latest Linux can crash and corrupt Btrfs file 
system if
OSDs are using snapshots (which is the default). Due to kernel bugs 
related to
Btrfs snapshots I also lost some OSDs until I found that snapshotting 
can be
disabled with "filestore btrfs snap = false".

I am wondering if this can be the problem.

Very interesting... I think I was just hit with that over night. :)

Yes, I would definitely recommend turning off snapshots.  I'm going to 
do that myself now.

Have you tested the memory in your server lately?  Memtest86+ on the 
ram, and badblocks on the SSD swap partition?

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com