I have registered a related issue(http://tracker.ceph.com/issues/10583), we need to make db backend error promote to ceph's log. On Tue, Jan 20, 2015 at 11:54 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: > Hi, > > Obviously, we can find lots of IO error in your rocksdb's log: > > 2015/01/20-19:08:13.452758 7f3a94b63700 (Original Log Time > 2015/01/20-19:08:13.449529) [default] compacted to: files[5 6 50 492 > 361 0 0 ], 10822321443458.2 MB/sec, level 1, files in(4, 6) out(28526) > MB in(135.9, 9.1) out(3803360014280.2), > read-write-amplify(27979046152.7) write-amplify(27979046151.6) IO > error: /var/lib/ceph/osd/ceph-0/current/059210.sst: Too many open > files > 2015/01/20-19:08:13.452760 7f3a94b63700 Waiting after background > compaction error: IO error: > /var/lib/ceph/osd/ceph-0/current/059210.sst: Too many open files, > Accumulated background error counts: 2 > 2015/01/20-19:08:14.946634 7f3a94b63700 [WARN] Compaction error: IO > error: /var/lib/ceph/osd/ceph-0/current/105226.sst: Too many open > files > 2015/01/20-19:08:14.946643 7f3a94b63700 (Original Log Time > 2015/01/20-19:08:14.941764) [default] compacted to: files[6 6 50 492 > 361 0 0 ], 13401580825960.6 MB/sec, level 1, files in(6, 6) out(46014) > MB in(205.9, 9.1) out(6136418344252.7), > read-write-amplify(29808966236.2) write-amplify(29808966235.2) IO > error: /var/lib/ceph/osd/ceph-0/current/105226.sst: Too many open > files > 2015/01/20-19:08:14.946646 7f3a94b63700 Waiting after background > compaction error: IO error: > /var/lib/ceph/osd/ceph-0/current/105226.sst: Too many open files, > Accumulated background error counts: 3 > 2015/01/20-19:08:16.459162 7f3a94b63700 [WARN] Compaction error: IO > error: /var/lib/ceph/osd/ceph-0/current/149702.sst: Too many open > files > > Because you set "rocksdb_max_open_files = 10240" in your ceph.conf, > you will let rocksdb open 10240 files. So if ceph-osd has OS fd limit > and rocksdb will failed to open more files and raise exception. > > So you need to increase os fd limit to > "rocksdb_max_open_files"+"estimated network socket in > osd"+"filestore_fd_cache_size" at least. > > I'm not sure this is the only cause of your problem because of limited > infos, but I hope it's the root cause. :-) > > > Thanks for your patient, ! > > On Tue, Jan 20, 2015 at 9:42 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: >> Haomai, >> >> PFA logs on fresh setup, with all the debug settings. >> >> This is what I used to dump some data:- >> >> rados -p benchpool1 bench 300 write -b 4194304 -t 8 --no-cleanup >> >> >> On Tue, Jan 20, 2015 at 3:59 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >>> Yeah, thankyou. >>> >>> I think your cluster is failed to read/write from rocksdb. But your >>> config disable rocksdb log file, so you can change >>> "rocksdb_info_log_level=debug" >>> "rocksdb_log=/var/log/ceph/ceph-osd-rocksdb.log" >>> >>> This log should explain the details I hope. >>> >>> On Tue, Jan 20, 2015 at 6:09 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: >>>> Haomai, >>>> >>>> PFA logs with debug_keyvaluestore=20/20, and perf dump output. >>>> >>>> On Tue, Jan 20, 2015 at 2:28 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >>>>> Sorry, could you add debug_keyvaluestore=20/20 to your config.conf and >>>>> run again to capture the dump logs? >>>>> >>>>> >>>>> And simply view the log, it seemed that keyvaluestore failed to submit >>>>> transaction to rocksdb. >>>>> >>>>> Additionally, run "ceph --admin-daemon=/var/run/ceph/[ceph-osd.*.pid] >>>>> perf dump" is help to verify the assumption. >>>>> >>>>> Thanks! >>>>> >>>>> On Tue, Jan 20, 2015 at 4:53 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: >>>>>> Haomai, >>>>>> >>>>>> PFA for the complete logs of one of the OSD daemon. In an attempt to >>>>>> start all osd daemon, I captured logs of one of the OSD daemon is >>>>>> pasted here: http://pastebin.com/SRBJknCM . >>>>>> >>>>>> >>>>>> >>>>>> On Tue, Jan 20, 2015 at 12:34 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote: >>>>>>> I think you can find related infos from log: /var/log/ceph/osd/ceph-osd* >>>>>>> >>>>>>> It should help us to figure out. >>>>>>> >>>>>>> On Tue, Jan 20, 2015 at 2:48 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote: >>>>>>>> Hi All, >>>>>>>> >>>>>>>> I am trying to configure rocksdb as objectstore backend on a cluster >>>>>>>> with ceph version 0.91-375-g2a4cbfc. I built ceph using' make-debs.sh' >>>>>>>> which builds the source with --with-rocksdb option. I was able to get >>>>>>>> the cluster up and running with rockdbs as a backend, however as soon >>>>>>>> as I started dumping data on cluster using radosbench , cluster become >>>>>>>> miserable just after 10 sec of write I/Os. Some OSD daemons marked >>>>>>>> down randomly for no apparent reason. Even if I make all daemons >>>>>>>> start/up again , after some time some daemons marked down again >>>>>>>> randomly.Recovery i/o does the job this time , that external i/o done >>>>>>>> before. What could be the possible problem and solution for this >>>>>>>> behaviour? >>>>>>>> >>>>>>>> Some more details: >>>>>>>> >>>>>>>> 1. Setup is 3 OSD nodes with 10 SanDisk Optimus Eco (400GB) >>>>>>>> each.Drives were working fine with filestore backend. >>>>>>>> 2. 3 Monitors and 1 client from which I am running RadosBench. >>>>>>>> 3. Ubuntu14.04 on each node. (3.13.0-24-generic) >>>>>>>> 4. I create OSDs on each nodes using below script(of course with >>>>>>>> different osd numbers):- >>>>>>>> ################################## >>>>>>>> #!/bin/bash >>>>>>>> sudo stop ceph-osd-all >>>>>>>> ps -eaf|grep osd |awk '{print $2}'|xargs sudo kill -9 >>>>>>>> osd_num=(0 1 2 3 4 5 6 7 8 9) >>>>>>>> drives=(sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1) >>>>>>>> node="rack6-storage-1" >>>>>>>> for ((i=0;i<10;i++)) >>>>>>>> do >>>>>>>> sudo ceph osd rm ${osd_num[i]} >>>>>>>> sudo ceph osd crush rm osd.${osd_num[i]} >>>>>>>> sudo ceph auth del osd.${osd_num[i]} >>>>>>>> sudo umount -f /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>>>>> ceph osd create >>>>>>>> sudo rm -rf /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>>>>> sudo mkdir -p /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>>>>> sudo mkfs.xfs -f -i size=2048 /dev/${drives[i]} >>>>>>>> sudo mount -o rw,noatime,inode64,logbsize=256k,delaylog >>>>>>>> /dev/${drives[i]} /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>>>>> sudo ceph osd crush add osd.${osd_num[i]} 1 root=default host=$node >>>>>>>> sudo sudo ceph-osd --id ${osd_num[i]} -d --mkkey --mkfs >>>>>>>> --osd-data /var/lib/ceph/osd/ceph-${osd_num[i]} >>>>>>>> ceph auth add osd.${osd_num[i]} osd 'allow *' mon 'allow >>>>>>>> profile osd' -i /var/lib/ceph/osd/ceph-${osd_num[i]}/keyring >>>>>>>> sudo sudo ceph-osd -i ${osd_num[i]} >>>>>>>> done >>>>>>>> ################################### >>>>>>>> >>>>>>>> 5. Some configs that might be relevant are as follows:- >>>>>>>> ######### >>>>>>>> enable_experimental_unrecoverable_data_corrupting_features = keyvaluestore >>>>>>>> osd_objectstore = keyvaluestore >>>>>>>> keyvaluestore_backend = rocksdb >>>>>>>> keyvaluestore queue max ops = 500 >>>>>>>> keyvaluestore queue max bytes = 100 >>>>>>>> keyvaluestore header cache size = 2048 >>>>>>>> keyvaluestore op threads = 10 >>>>>>>> keyvaluestore_max_expected_write_size = 4096000 >>>>>>>> leveldb_write_buffer_size = 33554432 >>>>>>>> leveldb_cache_size = 536870912 >>>>>>>> leveldb_bloom_size = 0 >>>>>>>> leveldb_max_open_files = 10240 >>>>>>>> leveldb_compression = false >>>>>>>> leveldb_paranoid = false >>>>>>>> leveldb_log = /dev/null >>>>>>>> leveldb_compact_on_mount = false >>>>>>>> rocksdb_write_buffer_size = 33554432 >>>>>>>> rocksdb_cache_size = 536870912 >>>>>>>> rocksdb_bloom_size = 0 >>>>>>>> rocksdb_max_open_files = 10240 >>>>>>>> rocksdb_compression = false >>>>>>>> rocksdb_paranoid = false >>>>>>>> rocksdb_log = /dev/null >>>>>>>> rocksdb_compact_on_mount = false >>>>>>>> ######### >>>>>>>> >>>>>>>> 6. Objects get stored in *.sst files, seems rocksbd is configured correctly:- >>>>>>>> >>>>>>>> ls -l /var/lib/ceph/osd/ceph-20/current/ |more >>>>>>>> total 3169352 >>>>>>>> -rw-r--r-- 1 root root 2128430 Jan 20 00:04 000031.sst >>>>>>>> -rw-r--r-- 1 root root 2128430 Jan 20 00:04 000033.sst >>>>>>>> -rw-r--r-- 1 root root 2128431 Jan 20 00:04 000035.sst >>>>>>>> ............ >>>>>>>> 7. This is current state of cluster:- >>>>>>>> ################ >>>>>>>> monmap e1: 3 mons at >>>>>>>> {rack6-ramp-1=10.x.x.x:6789/0,rack6-ramp-2=10.x.x.x:6789/0,rack6-ramp-3=10.x.x.x:6789/0} >>>>>>>> election epoch 16, quorum 0,1,2 rack6-ramp-1,rack6-ramp-2,rack6-ramp-3 >>>>>>>> osdmap e547: 30 osds: 8 up, 8 in >>>>>>>> pgmap v1059: 512 pgs, 1 pools, 18252 MB data, 4563 objects >>>>>>>> 22856 MB used, 2912 GB / 2934 GB avail >>>>>>>> 1587/13689 objects degraded (11.593%) >>>>>>>> 419/13689 objects misplaced (3.061%) >>>>>>>> 26/4563 unfound (0.570%) >>>>>>>> ################# >>>>>>>> >>>>>>>> I would be happy to provide any other information that is needed. >>>>>>>> >>>>>>>> -- >>>>>>>> -Pushpesh >>>>>>>> -- >>>>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in >>>>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx >>>>>>>> More majordomo info at http://vger.kernel.org/majordomo-info.html >>>>>>> >>>>>>> >>>>>>> >>>>>>> -- >>>>>>> Best Regards, >>>>>>> >>>>>>> Wheat >>>>>> >>>>>> >>>>>> >>>>>> -- >>>>>> -Pushpesh >>>>> >>>>> >>>>> >>>>> -- >>>>> Best Regards, >>>>> >>>>> Wheat >>>> >>>> >>>> >>>> -- >>>> -Pushpesh >>> >>> >>> >>> -- >>> Best Regards, >>> >>> Wheat >> >> >> >> -- >> -Pushpesh > > > > -- > Best Regards, > > Wheat -- Best Regards, Wheat -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html