Re: OSDs with btrfs are down

Lionel Bouton <lionel+ceph@xxxxxxxxxxx> · Sun, 04 Jan 2015 17:10:26 +0100

On 01/04/15 16:25, Jiri Kanicky wrote:
> Hi.
>
> I have been experiencing same issues on both nodes over the past 2
> days (never both nodes at the same time).  It seems the issue occurs
> after some time when copying  a large number of files to CephFS on my
> client node (I dont use RBD yet).
>
> These are new HP servers and the memory does not seem to have any
> issues in mem test. I use SSD for OS and normal drives for OSD. I
> think that the issue is not related to drives as it would be too much
> coincident to have 6 drives with bad blocks on both nodes.

The kernel can't allocate enough memory for btrfs, see this:

Jan  4 17:11:06 ceph1 kernel: [756636.535661] kworker/0:2: page
allocation failure: order:1, mode:0x204020

and this:

Jan  4 17:11:06 ceph1 kernel: [756636.536112] BTRFS: error (device sdb1)
in create_pending_snapshot:1334: errno=-12 Out of memory

OSDs need a lot of memory: 1GB during normal operation and probably
around 2GB during resynchronisations (at least my monitoring very rarely
detect them going past this limit). So you probably had a short spike of
memory usage (some of which can't be moved to swap: kernel memory and
mlocked memory).

Even if you don't use Btrfs if you want to avoid any headache when
replacing / repairing / ... OSD you probably want to put at least 4GB in
your servers instead of 2GB.

I didn't realize there were BTRFS configuration options until now, there's:
filestore btrfs snap
filestore btrfs clone range

I believed that the single write for both the journal and store updates
in BTRFS was depending on snapshots, but "clone range" may hint that
this is supported independently.

Could anyone familiar with Ceph internals elaborate on what the
consequences of (de)activating the two configuration options above are
(expected performance gains? Additional Ceph features?).

Best regards,

Lionel Bouton

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com