Yeah, thankyou. I think your cluster is failed to read/write from rocksdb. But your config disable rocksdb log file, so you can change "rocksdb_info_log_level=debug" "rocksdb_log=/var/log/ceph/ceph-osd-rocksdb.log" This log should explain the details I hope. On Tue, Jan 20, 2015 at 6:09 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: > Haomai, > > PFA logs with debug_keyvaluestore=20/20, and perf dump output. > > On Tue, Jan 20, 2015 at 2:28 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >> Sorry, could you add debug_keyvaluestore=20/20 to your config.conf and >> run again to capture the dump logs? >> >> >> And simply view the log, it seemed that keyvaluestore failed to submit >> transaction to rocksdb. >> >> Additionally, run "ceph --admin-daemon=/var/run/ceph/[ceph-osd.*.pid] >> perf dump" is help to verify the assumption. >> >> Thanks! >> >> On Tue, Jan 20, 2015 at 4:53 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: >>> Haomai, >>> >>> PFA for the complete logs of one of the OSD daemon. In an attempt to >>> start all osd daemon, I captured logs of one of the OSD daemon is >>> pasted here: http://pastebin.com/SRBJknCM . >>> >>> >>> >>> On Tue, Jan 20, 2015 at 12:34 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >>>> I think you can find related infos from log: /var/log/ceph/osd/ceph-osd* >>>> >>>> It should help us to figure out. >>>> >>>> On Tue, Jan 20, 2015 at 2:48 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: >>>>> Hi All, >>>>> >>>>> I am trying to configure rocksdb as objectstore backend on a cluster >>>>> with ceph version 0.91-375-g2a4cbfc. I built ceph using' make-debs.sh' >>>>> which builds the source with --with-rocksdb option. I was able to get >>>>> the cluster up and running with rockdbs as a backend, however as soon >>>>> as I started dumping data on cluster using radosbench , cluster become >>>>> miserable just after 10 sec of write I/Os. Some OSD daemons marked >>>>> down randomly for no apparent reason. Even if I make all daemons >>>>> start/up again , after some time some daemons marked down again >>>>> randomly.Recovery i/o does the job this time , that external i/o done >>>>> before. What could be the possible problem and solution for this >>>>> behaviour? >>>>> >>>>> Some more details: >>>>> >>>>> 1. Setup is 3 OSD nodes with 10 SanDisk Optimus Eco (400GB) >>>>> each.Drives were working fine with filestore backend. >>>>> 2. 3 Monitors and 1 client from which I am running RadosBench. >>>>> 3. Ubuntu14.04 on each node. (3.13.0-24-generic) >>>>> 4. I create OSDs on each nodes using below script(of course with >>>>> different osd numbers):- >>>>> ################################## >>>>> #!/bin/bash >>>>> sudo stop ceph-osd-all >>>>> ps -eaf|grep osd |awk '{print $2}'|xargs sudo kill -9 >>>>> osd_num=(0 1 2 3 4 5 6 7 8 9) >>>>> drives=(sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1) >>>>> node="rack6-storage-1" >>>>> for ((i=0;i<10;i++)) >>>>> do >>>>> sudo ceph osd rm ${osd_num[i]} >>>>> sudo ceph osd crush rm osd.${osd_num[i]} >>>>> sudo ceph auth del osd.${osd_num[i]} >>>>> sudo umount -f /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>> ceph osd create >>>>> sudo rm -rf /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>> sudo mkdir -p /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>> sudo mkfs.xfs -f -i size=2048 /dev/${drives[i]} >>>>> sudo mount -o rw,noatime,inode64,logbsize=256k,delaylog >>>>> /dev/${drives[i]} /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>> sudo ceph osd crush add osd.${osd_num[i]} 1 root=default host=$node >>>>> sudo sudo ceph-osd --id ${osd_num[i]} -d --mkkey --mkfs >>>>> --osd-data /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>> ceph auth add osd.${osd_num[i]} osd 'allow *' mon 'allow >>>>> profile osd' -i /var/lib/ceph/osd/ceph-${osd_num[i]}/keyring >>>>> sudo sudo ceph-osd -i ${osd_num[i]} >>>>> done >>>>> ################################### >>>>> >>>>> 5. Some configs that might be relevant are as follows:- >>>>> ######### >>>>> enable_experimental_unrecoverable_data_corrupting_features = keyvaluestore >>>>> osd_objectstore = keyvaluestore >>>>> keyvaluestore_backend = rocksdb >>>>> keyvaluestore queue max ops = 500 >>>>> keyvaluestore queue max bytes = 100 >>>>> keyvaluestore header cache size = 2048 >>>>> keyvaluestore op threads = 10 >>>>> keyvaluestore_max_expected_write_size = 4096000 >>>>> leveldb_write_buffer_size = 33554432 >>>>> leveldb_cache_size = 536870912 >>>>> leveldb_bloom_size = 0 >>>>> leveldb_max_open_files = 10240 >>>>> leveldb_compression = false >>>>> leveldb_paranoid = false >>>>> leveldb_log = /dev/null >>>>> leveldb_compact_on_mount = false >>>>> rocksdb_write_buffer_size = 33554432 >>>>> rocksdb_cache_size = 536870912 >>>>> rocksdb_bloom_size = 0 >>>>> rocksdb_max_open_files = 10240 >>>>> rocksdb_compression = false >>>>> rocksdb_paranoid = false >>>>> rocksdb_log = /dev/null >>>>> rocksdb_compact_on_mount = false >>>>> ######### >>>>> >>>>> 6. Objects get stored in *.sst files, seems rocksbd is configured correctly:- >>>>> >>>>> ls -l /var/lib/ceph/osd/ceph-20/current/ |more >>>>> total 3169352 >>>>> -rw-r--r-- 1 root root 2128430 Jan 20 00:04 000031.sst >>>>> -rw-r--r-- 1 root root 2128430 Jan 20 00:04 000033.sst >>>>> -rw-r--r-- 1 root root 2128431 Jan 20 00:04 000035.sst >>>>> ............ >>>>> 7. This is current state of cluster:- >>>>> ################ >>>>> monmap e1: 3 mons at >>>>> {rack6-ramp-1=10.x.x.x:6789/0,rack6-ramp-2=10.x.x.x:6789/0,rack6-ramp-3=10.x.x.x:6789/0} >>>>> election epoch 16, quorum 0,1,2 rack6-ramp-1,rack6-ramp-2,rack6-ramp-3 >>>>> osdmap e547: 30 osds: 8 up, 8 in >>>>> pgmap v1059: 512 pgs, 1 pools, 18252 MB data, 4563 objects >>>>> 22856 MB used, 2912 GB / 2934 GB avail >>>>> 1587/13689 objects degraded (11.593%) >>>>> 419/13689 objects misplaced (3.061%) >>>>> 26/4563 unfound (0.570%) >>>>> ################# >>>>> >>>>> I would be happy to provide any other information that is needed. >>>>> >>>>> -- >>>>> -Pushpesh >>>>> -- >>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> >>>> Wheat >>> >>> >>> >>> -- >>> -Pushpesh >> >> >> >> -- >> Best Regards, >> >> Wheat > > > > -- > -Pushpesh -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html