Re: Ceph newbie(?) issues

Jan Marquardt <jm@xxxxxxxxxxx> · Mon, 5 Mar 2018 14:45:01 +0100

Am 05.03.18 um 13:13 schrieb Ronny Aasen:
> i had some similar issues when i started my proof of concept. especialy
> the snapshot deletion i remember well.
> 
> the rule of thumb for filestore that i assume you are running is 1GB ram
> per TB of osd. so with 8 x 4TB osd's you are looking at 32GB of ram for
> osd's + some  GB's for the mon service, + some GB's  for the os itself.
> 
> i suspect if you inspect your dmesg log and memory graphs you will find
> that the out of memory killer ends your osd's when the snap deletion (or
> any other high load task) runs.
> 
> I ended up reducing the number of osd's per node, since the old
> mainboard i used was maxed for memory.

Well, thanks for the broad hint. Somehow I assumed we fulfill the
recommendations, but of course you are right. We'll check if our boards
support 48 GB RAM. Unfortunately, there are currently no corresponding
messages. But I can't rule out that there haven't been any.

> corruptions occured for me as well. and they was normaly associated with
> disks dying or giving read errors. ceph often managed to fix them but
> sometimes i had to just remove the hurting OSD disk.
> 
> hage some graph's  to look at. personaly i used munin/munin-node since
> it was just an apt-get away from functioning graphs
> 
> also i used smartmontools to send me emails about hurting disks.
> and smartctl to check all disks for errors.

I'll check S.M.A.R.T stuff. I am wondering if scrubbing errors are
always caused by disk problems or if they also could be triggered
by flapping OSDs or other circumstances.

> good luck with ceph !

Thank you!
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com