I think you can find related infos from log: /var/log/ceph/osd/ceph-osd* It should help us to figure out. On Tue, Jan 20, 2015 at 2:48 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: > Hi All, > > I am trying to configure rocksdb as objectstore backend on a cluster > with ceph version 0.91-375-g2a4cbfc. I built ceph using' make-debs.sh' > which builds the source with --with-rocksdb option. I was able to get > the cluster up and running with rockdbs as a backend, however as soon > as I started dumping data on cluster using radosbench , cluster become > miserable just after 10 sec of write I/Os. Some OSD daemons marked > down randomly for no apparent reason. Even if I make all daemons > start/up again , after some time some daemons marked down again > randomly.Recovery i/o does the job this time , that external i/o done > before. What could be the possible problem and solution for this > behaviour? > > Some more details: > > 1. Setup is 3 OSD nodes with 10 SanDisk Optimus Eco (400GB) > each.Drives were working fine with filestore backend. > 2. 3 Monitors and 1 client from which I am running RadosBench. > 3. Ubuntu14.04 on each node. (3.13.0-24-generic) > 4. I create OSDs on each nodes using below script(of course with > different osd numbers):- > ################################## > #!/bin/bash > sudo stop ceph-osd-all > ps -eaf|grep osd |awk '{print $2}'|xargs sudo kill -9 > osd_num=(0 1 2 3 4 5 6 7 8 9) > drives=(sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1) > node="rack6-storage-1" > for ((i=0;i<10;i++)) > do > sudo ceph osd rm ${osd_num[i]} > sudo ceph osd crush rm osd.${osd_num[i]} > sudo ceph auth del osd.${osd_num[i]} > sudo umount -f /var/lib/ceph/osd/ceph-${osd_num[i]} > ceph osd create > sudo rm -rf /var/lib/ceph/osd/ceph-${osd_num[i]} > sudo mkdir -p /var/lib/ceph/osd/ceph-${osd_num[i]} > sudo mkfs.xfs -f -i size=2048 /dev/${drives[i]} > sudo mount -o rw,noatime,inode64,logbsize=256k,delaylog > /dev/${drives[i]} /var/lib/ceph/osd/ceph-${osd_num[i]} > sudo ceph osd crush add osd.${osd_num[i]} 1 root=default host=$node > sudo sudo ceph-osd --id ${osd_num[i]} -d --mkkey --mkfs > --osd-data /var/lib/ceph/osd/ceph-${osd_num[i]} > ceph auth add osd.${osd_num[i]} osd 'allow *' mon 'allow > profile osd' -i /var/lib/ceph/osd/ceph-${osd_num[i]}/keyring > sudo sudo ceph-osd -i ${osd_num[i]} > done > ################################### > > 5. Some configs that might be relevant are as follows:- > ######### > enable_experimental_unrecoverable_data_corrupting_features = keyvaluestore > osd_objectstore = keyvaluestore > keyvaluestore_backend = rocksdb > keyvaluestore queue max ops = 500 > keyvaluestore queue max bytes = 100 > keyvaluestore header cache size = 2048 > keyvaluestore op threads = 10 > keyvaluestore_max_expected_write_size = 4096000 > leveldb_write_buffer_size = 33554432 > leveldb_cache_size = 536870912 > leveldb_bloom_size = 0 > leveldb_max_open_files = 10240 > leveldb_compression = false > leveldb_paranoid = false > leveldb_log = /dev/null > leveldb_compact_on_mount = false > rocksdb_write_buffer_size = 33554432 > rocksdb_cache_size = 536870912 > rocksdb_bloom_size = 0 > rocksdb_max_open_files = 10240 > rocksdb_compression = false > rocksdb_paranoid = false > rocksdb_log = /dev/null > rocksdb_compact_on_mount = false > ######### > > 6. Objects get stored in *.sst files, seems rocksbd is configured correctly:- > > ls -l /var/lib/ceph/osd/ceph-20/current/ |more > total 3169352 > -rw-r--r-- 1 root root 2128430 Jan 20 00:04 000031.sst > -rw-r--r-- 1 root root 2128430 Jan 20 00:04 000033.sst > -rw-r--r-- 1 root root 2128431 Jan 20 00:04 000035.sst > ............ > 7. This is current state of cluster:- > ################ > monmap e1: 3 mons at > {rack6-ramp-1=10.x.x.x:6789/0,rack6-ramp-2=10.x.x.x:6789/0,rack6-ramp-3=10.x.x.x:6789/0} > election epoch 16, quorum 0,1,2 rack6-ramp-1,rack6-ramp-2,rack6-ramp-3 > osdmap e547: 30 osds: 8 up, 8 in > pgmap v1059: 512 pgs, 1 pools, 18252 MB data, 4563 objects > 22856 MB used, 2912 GB / 2934 GB avail > 1587/13689 objects degraded (11.593%) > 419/13689 objects misplaced (3.061%) > 26/4563 unfound (0.570%) > ################# > > I would be happy to provide any other information that is needed. > > -- > -Pushpesh > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html