High 0.94.5 OSD memory use at 8GB RAM/TB raw disk during recovery

Laurent GUERBY <laurent@xxxxxxxxxx> · Tue, 01 Dec 2015 01:52:02 +0100

Hi,

We lost a disk today in our ceph cluster so we added a new machine with
4 disks to replace the capacity and we activated straw1 tunable too
(we also tried straw2 but we quickly backed up this change).

During recovery OSD started crashing on all of our machines
the issue being OSD RAM usage that goes very high, eg:

24078 root      20   0 27.784g 0.026t  10888 S   5.9 84.9
16:23.63 /usr/bin/ceph-osd --cluster=ceph -i 41 -f
/dev/sda1       2.7T  2.2T  514G  82% /var/lib/ceph/osd/ceph-41

That's about 8GB resident RAM per TB of disk, way above
what we provisionned ~ 2-4 GB RAM/TB.

We rebuilt 0.94.5 with the three memory related commits below but
it didn't change anything.

Right now our cluster is unable to fully restart and recover with the
machines and RAM we have been working with for the past year.

Any idea on what to look for?

Thanks in advance,

Sincerely,

Laurent

commit 296bec72649884447b59e785c345c53994df9e09
Author: xiexingguo <258156334@xxxxxx>
Date:   Mon Oct 26 18:38:01 2015 +0800

    FileStore: potential memory leak if _fgetattrs fails

    Memory leak happens if _fgetattrs encounters some error and simply
returns.
    Fixes: #13597
    Signed-off-by: xie xingguo <xie.xingguo@xxxxxxxxxx>

    (cherry picked from commit ace7dd096b58a88e25ce16f011aed09269f2a2b4)

commit 16aa14ab0208df568e64e2a4f7fe7692eaf6b469
Author: Xinze Chi <xmdxcxz@xxxxxxxxx>
Date:   Sun Aug 2 18:36:40 2015 +0800

    bug fix: osd: do not cache unused buffer in attrs

    attrs only reference the origin bufferlist (decode from MOSDPGPush
or
    ECSubReadReply message) whose size is much greater than attrs in
recovery.
    If obc cache it (get_obc maybe cache the attr), this causes the
whole origin
    bufferlist would not be free until obc is evicted from obc cache. So
rebuild
    the bufferlist before cache it.

    Fixes: #12565
    Signed-off-by: Ning Yao <zay11022@xxxxxxxxx>
    Signed-off-by: Xinze Chi <xmdxcxz@xxxxxxxxx>
    (cherry picked from commit c5895d3fad9da0ab7f05f134c49e22795d5c61f3)

commit 51ea1ca7f4a7763bfeb110957cd8a6f33b8a1422
Author: xiexingguo <258156334@xxxxxx>
Date:   Thu Oct 29 20:04:11 2015 +0800

    Objecter: pool_op callback may hang forever.

    pool_op callback may hang forever due to osdmap update during reply
handling.
    Fixes: #13642
    Signed-off-by: xie xingguo <xie.xingguo@xxxxxxxxxx>

    (cherry picked from commit 00c6fa9e31975a935ed2bb33a099e2b4f02ad7f2)

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com