Re: ceph-0.91 with KVstore rocksdb as objectstore backend

Haomai Wang <haomaiwang@xxxxxxxxx> · Tue, 20 Jan 2015 18:29:26 +0800

Yeah, thankyou.

I think your cluster is failed to read/write from rocksdb. But your
config disable rocksdb log file, so you can change
"rocksdb_info_log_level=debug"
"rocksdb_log=/var/log/ceph/ceph-osd-rocksdb.log"

This log should explain the details I hope.

On Tue, Jan 20, 2015 at 6:09 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote:
> Haomai,
>
> PFA logs with debug_keyvaluestore=20/20, and perf dump output.
>
> On Tue, Jan 20, 2015 at 2:28 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
>> Sorry, could you add debug_keyvaluestore=20/20 to your config.conf and
>> run again to capture the dump logs?
>>
>>
>> And simply view the log, it seemed that keyvaluestore failed to submit
>> transaction to rocksdb.
>>
>> Additionally, run "ceph --admin-daemon=/var/run/ceph/[ceph-osd.*.pid]
>> perf dump" is help to verify the assumption.
>>
>> Thanks!
>>
>> On Tue, Jan 20, 2015 at 4:53 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote:
>>> Haomai,
>>>
>>> PFA for the complete logs of one of the OSD daemon. In an attempt to
>>> start all osd daemon, I captured logs of one of the OSD daemon is
>>> pasted here:  http://pastebin.com/SRBJknCM .
>>>
>>>
>>>
>>> On Tue, Jan 20, 2015 at 12:34 PM, Haomai Wang <haomaiwang@xxxxxxxxx> wrote:
>>>> I think you can find related infos from log: /var/log/ceph/osd/ceph-osd*
>>>>
>>>> It should help us to figure out.
>>>>
>>>> On Tue, Jan 20, 2015 at 2:48 PM, pushpesh sharma <pushpesh.eck@xxxxxxxxx> wrote:
>>>>> Hi All,
>>>>>
>>>>> I am trying to configure rocksdb as objectstore backend on a cluster
>>>>> with ceph version 0.91-375-g2a4cbfc. I built ceph using' make-debs.sh'
>>>>> which builds the source with  --with-rocksdb option. I was able to get
>>>>> the cluster up and running with rockdbs as a backend, however as soon
>>>>> as I started dumping data on cluster using radosbench , cluster become
>>>>> miserable just after 10 sec of write I/Os. Some OSD daemons marked
>>>>> down randomly for no apparent reason. Even if  I make all daemons
>>>>> start/up again , after some time some daemons marked down again
>>>>> randomly.Recovery i/o does the job this time , that external i/o done
>>>>> before. What could be the possible problem and solution for this
>>>>> behaviour?
>>>>>
>>>>> Some more details:
>>>>>
>>>>> 1. Setup is 3 OSD nodes with 10 SanDisk Optimus Eco (400GB)
>>>>> each.Drives were working fine with filestore backend.
>>>>> 2. 3 Monitors and 1 client from which I am running RadosBench.
>>>>> 3. Ubuntu14.04 on each node. (3.13.0-24-generic)
>>>>> 4. I create OSDs on each nodes using below script(of course with
>>>>> different osd numbers):-
>>>>> ##################################
>>>>> #!/bin/bash
>>>>> sudo stop ceph-osd-all
>>>>> ps -eaf|grep osd |awk '{print $2}'|xargs sudo kill -9
>>>>> osd_num=(0 1 2 3 4 5 6 7 8 9)
>>>>> drives=(sdb1 sdc1 sdd1 sde1 sdf1 sdg1 sdh1 sdi1 sdj1 sdk1)
>>>>> node="rack6-storage-1"
>>>>> for ((i=0;i<10;i++))
>>>>> do
>>>>>         sudo ceph osd rm ${osd_num[i]}
>>>>>         sudo ceph osd crush rm osd.${osd_num[i]}
>>>>>         sudo ceph auth del osd.${osd_num[i]}
>>>>>         sudo umount -f /var/lib/ceph/osd/ceph-${osd_num[i]}
>>>>>         ceph osd create
>>>>>         sudo rm -rf /var/lib/ceph/osd/ceph-${osd_num[i]}
>>>>>         sudo mkdir -p /var/lib/ceph/osd/ceph-${osd_num[i]}
>>>>>         sudo mkfs.xfs -f -i size=2048 /dev/${drives[i]}
>>>>>         sudo mount -o rw,noatime,inode64,logbsize=256k,delaylog
>>>>> /dev/${drives[i]} /var/lib/ceph/osd/ceph-${osd_num[i]}
>>>>>         sudo ceph osd crush add osd.${osd_num[i]} 1 root=default host=$node
>>>>>         sudo sudo ceph-osd --id ${osd_num[i]} -d --mkkey --mkfs
>>>>> --osd-data /var/lib/ceph/osd/ceph-${osd_num[i]}
>>>>>         ceph auth add osd.${osd_num[i]} osd 'allow *' mon 'allow
>>>>> profile osd' -i /var/lib/ceph/osd/ceph-${osd_num[i]}/keyring
>>>>>         sudo sudo ceph-osd -i ${osd_num[i]}
>>>>> done
>>>>> ###################################
>>>>>
>>>>> 5. Some configs that might be relevant are as follows:-
>>>>> #########
>>>>> enable_experimental_unrecoverable_data_corrupting_features = keyvaluestore
>>>>> osd_objectstore = keyvaluestore
>>>>> keyvaluestore_backend = rocksdb
>>>>> keyvaluestore queue max ops = 500
>>>>> keyvaluestore queue max bytes = 100
>>>>> keyvaluestore header cache size = 2048
>>>>> keyvaluestore op threads = 10
>>>>> keyvaluestore_max_expected_write_size = 4096000
>>>>> leveldb_write_buffer_size = 33554432
>>>>> leveldb_cache_size = 536870912
>>>>> leveldb_bloom_size = 0
>>>>> leveldb_max_open_files = 10240
>>>>> leveldb_compression = false
>>>>> leveldb_paranoid = false
>>>>> leveldb_log = /dev/null
>>>>> leveldb_compact_on_mount = false
>>>>> rocksdb_write_buffer_size = 33554432
>>>>> rocksdb_cache_size = 536870912
>>>>> rocksdb_bloom_size = 0
>>>>> rocksdb_max_open_files = 10240
>>>>> rocksdb_compression = false
>>>>> rocksdb_paranoid = false
>>>>> rocksdb_log = /dev/null
>>>>> rocksdb_compact_on_mount = false
>>>>> #########
>>>>>
>>>>> 6. Objects get stored in *.sst files, seems rocksbd is configured correctly:-
>>>>>
>>>>> ls -l /var/lib/ceph/osd/ceph-20/current/ |more
>>>>> total 3169352
>>>>> -rw-r--r-- 1 root root  2128430 Jan 20 00:04 000031.sst
>>>>> -rw-r--r-- 1 root root  2128430 Jan 20 00:04 000033.sst
>>>>> -rw-r--r-- 1 root root  2128431 Jan 20 00:04 000035.sst
>>>>> ............
>>>>> 7. This is current state of cluster:-
>>>>> ################
>>>>> monmap e1: 3 mons at
>>>>> {rack6-ramp-1=10.x.x.x:6789/0,rack6-ramp-2=10.x.x.x:6789/0,rack6-ramp-3=10.x.x.x:6789/0}
>>>>> election epoch 16, quorum 0,1,2 rack6-ramp-1,rack6-ramp-2,rack6-ramp-3
>>>>> osdmap e547: 30 osds: 8 up, 8 in
>>>>>       pgmap v1059: 512 pgs, 1 pools, 18252 MB data, 4563 objects
>>>>>             22856 MB used, 2912 GB / 2934 GB avail
>>>>>             1587/13689 objects degraded (11.593%)
>>>>>             419/13689 objects misplaced (3.061%)
>>>>>             26/4563 unfound (0.570%)
>>>>> #################
>>>>>
>>>>> I would be happy to provide any other information that is needed.
>>>>>
>>>>> --
>>>>> -Pushpesh
>>>>> --
>>>>> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
>>>>> the body of a message to majordomo@xxxxxxxxxxxxxxx
>>>>> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>>>>
>>>>
>>>>
>>>> --
>>>> Best Regards,
>>>>
>>>> Wheat
>>>
>>>
>>>
>>> --
>>> -Pushpesh
>>
>>
>>
>> --
>> Best Regards,
>>
>> Wheat
>
>
>
> --
> -Pushpesh

-- 
Best Regards,

Wheat
--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html