Re: OSDs with btrfs are down

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Sun, Jan 4, 2015 at 8:10 AM, Lionel Bouton <lionel+ceph@xxxxxxxxxxx> wrote:
> On 01/04/15 16:25, Jiri Kanicky wrote:
>> Hi.
>>
>> I have been experiencing same issues on both nodes over the past 2
>> days (never both nodes at the same time).  It seems the issue occurs
>> after some time when copying  a large number of files to CephFS on my
>> client node (I dont use RBD yet).
>>
>> These are new HP servers and the memory does not seem to have any
>> issues in mem test. I use SSD for OS and normal drives for OSD. I
>> think that the issue is not related to drives as it would be too much
>> coincident to have 6 drives with bad blocks on both nodes.
>
> The kernel can't allocate enough memory for btrfs, see this:
>
> Jan  4 17:11:06 ceph1 kernel: [756636.535661] kworker/0:2: page
> allocation failure: order:1, mode:0x204020
>
> and this:
>
> Jan  4 17:11:06 ceph1 kernel: [756636.536112] BTRFS: error (device sdb1)
> in create_pending_snapshot:1334: errno=-12 Out of memory
>
> OSDs need a lot of memory: 1GB during normal operation and probably
> around 2GB during resynchronisations (at least my monitoring very rarely
> detect them going past this limit). So you probably had a short spike of
> memory usage (some of which can't be moved to swap: kernel memory and
> mlocked memory).
>
> Even if you don't use Btrfs if you want to avoid any headache when
> replacing / repairing / ... OSD you probably want to put at least 4GB in
> your servers instead of 2GB.
>
>
>
> I didn't realize there were BTRFS configuration options until now, there's:
> filestore btrfs snap
> filestore btrfs clone range
>
> I believed that the single write for both the journal and store updates
> in BTRFS was depending on snapshots, but "clone range" may hint that
> this is supported independently.
>
> Could anyone familiar with Ceph internals elaborate on what the
> consequences of (de)activating the two configuration options above are
> (expected performance gains? Additional Ceph features?).

"filestore btrfs snap" controls whether to use btrfs snapshots to keep
the journal and backing store in check. WIth that option disabled it
handles things in basically the same way we do with xfs.

"filestore btrfs clone range" I believe controls how we do RADOS
object clones. With this option enabled we use the btrfs clone range
ioctl (? I think that's the interface); without it we do our own
copies, again basically the same as we do with xfs.
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux