Am 05.03.18 um 13:13 schrieb Ronny Aasen: > i had some similar issues when i started my proof of concept. especialy > the snapshot deletion i remember well. > > the rule of thumb for filestore that i assume you are running is 1GB ram > per TB of osd. so with 8 x 4TB osd's you are looking at 32GB of ram for > osd's + some GB's for the mon service, + some GB's for the os itself. > > i suspect if you inspect your dmesg log and memory graphs you will find > that the out of memory killer ends your osd's when the snap deletion (or > any other high load task) runs. > > I ended up reducing the number of osd's per node, since the old > mainboard i used was maxed for memory. Well, thanks for the broad hint. Somehow I assumed we fulfill the recommendations, but of course you are right. We'll check if our boards support 48 GB RAM. Unfortunately, there are currently no corresponding messages. But I can't rule out that there haven't been any. > corruptions occured for me as well. and they was normaly associated with > disks dying or giving read errors. ceph often managed to fix them but > sometimes i had to just remove the hurting OSD disk. > > hage some graph's to look at. personaly i used munin/munin-node since > it was just an apt-get away from functioning graphs > > also i used smartmontools to send me emails about hurting disks. > and smartctl to check all disks for errors. I'll check S.M.A.R.T stuff. I am wondering if scrubbing errors are always caused by disk problems or if they also could be triggered by flapping OSDs or other circumstances. > good luck with ceph ! Thank you! _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com