Hi.
I have been experiencing same issues on both nodes over the past 2 days
(never both nodes at the same time). It seems the issue occurs after
some time when copying a large number of files to CephFS on my client
node (I dont use RBD yet).
These are new HP servers and the memory does not seem to have any issues
in mem test. I use SSD for OS and normal drives for OSD. I think that
the issue is not related to drives as it would be too much coincident to
have 6 drives with bad blocks on both nodes.
I will also disable the snapshots and report back after few days.
Thx Jiri
On 5/01/2015 01:33, Dyweni - Ceph-Users wrote:
On 2015-01-04 08:21, Jiri Kanicky wrote:
More googling took me to the following post:
http://lists.ceph.com/pipermail/ceph-users-ceph.com/2014-June/040279.html
Linux 3.14.1 is affected by serious Btrfs regression(s) that were
fixed in
later releases.
Unfortunately even latest Linux can crash and corrupt Btrfs file
system if
OSDs are using snapshots (which is the default). Due to kernel bugs
related to
Btrfs snapshots I also lost some OSDs until I found that snapshotting
can be
disabled with "filestore btrfs snap = false".
I am wondering if this can be the problem.
Very interesting... I think I was just hit with that over night. :)
Yes, I would definitely recommend turning off snapshots. I'm going to
do that myself now.
Have you tested the memory in your server lately? Memtest86+ on the
ram, and badblocks on the SSD swap partition?
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com