Re: High 0.94.5 OSD memory use at 8GB RAM/TB raw disk during recovery

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On Nov 30, 2015, at 6:52 PM, Laurent GUERBY <laurent@xxxxxxxxxx> wrote:
> 
> Hi,
> 
> We lost a disk today in our ceph cluster so we added a new machine with
> 4 disks to replace the capacity and we activated straw1 tunable too
> (we also tried straw2 but we quickly backed up this change).
> 
> During recovery OSD started crashing on all of our machines
> the issue being OSD RAM usage that goes very high, eg:
> 
> 24078 root      20   0 27.784g 0.026t  10888 S   5.9 84.9
> 16:23.63 /usr/bin/ceph-osd --cluster=ceph -i 41 -f
> /dev/sda1       2.7T  2.2T  514G  82% /var/lib/ceph/osd/ceph-41
> 
> That's about 8GB resident RAM per TB of disk, way above
> what we provisionned ~ 2-4 GB RAM/TB.

We had something vaguely similar (not nearly that dramatic though!) happen to us. During a recovery (actually, I think this was rebalancing after upgrading from an earlier version of ceph), our OSDs took so much memory they would get killed by oom_killer and we couldn't keep the cluster up long enough to get back to healthy. 

A solution for us was to enable zswap; previously we had been running with no swap at all. 

If you are running a kernel newer than 3.11 (you might want more recent than that as I believe there were major fixes after 3.17), then enabling zswap allows the kernel to compress pages in memory before needing to touch disk. The default max pool size for this is 20% of memory. There is extra CPU time to compress/decompress, but it's much faster than going to disk, and the OSD data appears to be quite compressible. For us, nothing actually made it to the disk, but a swapfile must to be enabled for zswap to do its work. 

https://www.kernel.org/doc/Documentation/vm/zswap.txt
http://askubuntu.com/questions/471912/zram-vs-zswap-vs-zcache-ultimate-guide-when-to-use-which-one

Add "zswap.enabled=1" to your kernel bool parameters and reboot. 

If you have no swap file/partition/disk/whatever, then you need one for zswap to actually do anything. Here is an example, but use whatever sizes, locations, process you prefer:

dd if=/dev/zero of=/var/swap bs=1M count=8192
chmod 600 /var/swap
mkswap /var/swap
swapon /var/swap

Consider adding it to /etc/fstab:
/var/swap	swap	swap	defaults 0 0 

This got us through the rebalancing. The OSDs eventually returned to normal, but we've just left zswap enabled with no apparent problems. I don't know that it will be enough for your situation, but it might help. 

Ryan
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux