Hi, we are relatively new to Ceph and are observing some issues, where I'd like to know how likely they are to happen when operating a Ceph cluster. Currently our setup consists of three servers which are acting as OSDs and MONs. Each server has two Intel Xeon L5420 (yes, I know, it's not state of the art, but we thought it would be sufficient for a Proof of Concept. Maybe we were wrong?) and 24 GB RAM and is running 8 OSDs with 4 TB harddisks. 4 OSDs are sharing one SSD for journaling. We started on Kraken and upgraded lately to Luminous. The next two OSD servers and three separate MONs are ready for deployment. Please find attached our ceph.conf. Current usage looks like this: data: pools: 1 pools, 768 pgs objects: 5240k objects, 18357 GB usage: 59825 GB used, 29538 GB / 89364 GB avail We have only one pool which is exclusively used for rbd. We started filling it with data and creating snapshots in January until Mid of February. Everything was working like a charm until we started removing old snapshots then. While we were removing snapshots for the first time, OSDs started flapping. Besides this there was no other load on the cluster. For idle times we solved it by adding osd snap trim priority = 1 osd snap trim sleep = 0.1 to ceph.conf. When there is load from other operations and we remove big snapshots OSD flapping still occurs. Last week our first scrub errors appeared. Repairing the first one was no big deal. The second one however was, because the instructed OSD started crashing. First on Friday osd.17 and today osd.11. ceph1:~# ceph pg repair 0.1b2 instructing pg 0.1b2 on osd.17 to repair ceph1:~# ceph pg repair 0.1b2 instructing pg 0.1b2 on osd.11 to repair I am still researching on the crashes, but already would be thankful for any input. Any opinions, hints and advices would really be appreciated. Best Regards Jan
[global] fsid = c59e56df-2043-4c92-9492-25f05f268d9f mon_initial_members = ceph1, ceph2, ceph3 mon_host = 10.10.100.21,10.10.100.22,10.10.100.23 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx public network = 10.10.100.0/24 [osd] osd journal size = 0 osd snap trim priority = 1 osd snap trim sleep = 0.1 [client] rbd default features = 3
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com