Try increasing limit on open files, ulimit -n <some big number> before initiating cluster/osd -----Original Message----- From: ceph-devel-owner@xxxxxxxxxxxxxxx [mailto:ceph-devel-owner@xxxxxxxxxxxxxxx] On Behalf Of Vish (Vishwanath) Maram-SSI Sent: Wednesday, October 28, 2015 11:52 PM To: ceph-devel-owner@xxxxxxxxxxxxxxx Cc: ceph-devel@xxxxxxxxxxxxxxx Subject: [Newstore] FIO Read's from Cient causes OSD *** Caught signal (Aborted) ** Hi, We are observing a crash of OSD whenever we run FIO Read's from a client. Setup is very simple and explained as below: 1. One OSD with Ceph Version "ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68)", pulled the code from GIT and compiled/Installed. 2. One Client with same version of CEPH. 3. FIO Version - fio-2.2.10-16-gd223 4. Ceph Conf as given below 5. Crash log details from log file as below 6. FIO Script as given below CEPH Conf - [global] fsid = 9eda02e2-04b7-4eed-a85a-8471ea51528d mon_initial_members = msl-dsma-spoc08 mon_host = 10.10.10.190 auth_cluster_required = cephx auth_service_required = cephx auth_client_required = cephx auth_supported = none #Needed for Newstore osd_objectstore = newstore enable experimental unrecoverable data corrupting features = newstore, rocksdb newstore_backend = rocksdb #Debug - Start Removed for now to debug #newstore_max_dir_size = 4096 #newstore_sync_io = true #newstore_sync_transaction = true #newstore_sync_submit_transaction = true #newstore_sync_wal_apply = true #newstore_overlay_max = 0 #Debug - End #Needed for Newstore filestore_xattr_use_omap = true osd pool default size = 1 rbd cache = false debug_lockdep = 0/0 debug_context = 0/0 debug_crush = 0/0 debug_buffer = 0/0 debug_timer = 0/0 debug_filer = 0/0 debug_objecter = 0/0 debug_rados = 0/0 debug_rbd = 0/0 debug_journaler = 0/0 debug_objectcatcher = 0/0 debug_client = 0/0 debug_osd = 0/0 debug_optracker = 0/0 debug_objclass = 0/0 debug_filestore = 0/0 debug_journal = 0/0 debug_ms = 0/0 debug_monc = 0/0 debug_tp = 0/0 debug_auth = 0/0 debug_finisher = 0/0 debug_heartbeatmap = 0/0 debug_perfcounter = 0/0 debug_asok = 0/0 debug_throttle = 0/0 debug_mon = 0/0 debug_paxos = 0/0 debug_rgw = 0/0 osd_op_threads = 5 osd_op_num_threads_per_shard = 1 osd_op_num_shards = 25 #osd_op_num_sharded_pool_threads = 25 filestore_op_threads = 4 ms_nocrc = true filestore_fd_cache_size = 64 filestore_fd_cache_shards = 32 cephx sign messages = false cephx require signatures = false ms_dispatch_throttle_bytes = 0 throttler_perf_counter = false [osd] osd_client_message_size_cap = 0 osd_client_message_cap = 0 osd_enable_op_tracker = false Crash details from the log: -194> 2015-10-28 10:54:40.792957 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915510 -193> 2015-10-28 10:54:40.792959 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916590 -192> 2015-10-28 10:54:40.792962 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914990 -191> 2015-10-28 10:54:40.792965 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916490 -190> 2015-10-28 10:54:40.792968 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916090 -189> 2015-10-28 10:54:40.792971 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915c10 -188> 2015-10-28 10:54:40.792975 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba917190 -187> 2015-10-28 10:54:40.792977 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916810 -186> 2015-10-28 10:54:40.792980 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914790 -185> 2015-10-28 10:54:40.792983 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915e10 -184> 2015-10-28 10:54:40.792986 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915f10 -183> 2015-10-28 10:54:40.792988 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915f90 -182> 2015-10-28 10:54:40.792992 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914510 ... -10> 2015-10-28 10:55:45.240480 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -9> 2015-10-28 10:55:45.240830 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -8> 2015-10-28 10:55:45.241135 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -7> 2015-10-28 10:55:45.241418 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -6> 2015-10-28 10:55:45.241674 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -5> 2015-10-28 10:55:45.241913 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -4> 2015-10-28 10:55:45.242150 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -3> 2015-10-28 10:55:45.242391 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -2> 2015-10-28 10:55:45.242614 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) -1> 2015-10-28 10:55:45.242885 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) 0> 2015-10-28 10:55:54.323685 7f156e354700 -1 *** Caught signal (Aborted) ** in thread 7f156e354700 ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68) 1: (()+0x80b70a) [0x7f159c91670a] 2: (()+0x10340) [0x7f159afef340] 3: (gsignal()+0x39) [0x7f1599116cc9] 4: (abort()+0x148) [0x7f159911a0d8] 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1599a21535] 6: (()+0x5e6d6) [0x7f1599a1f6d6] 7: (()+0x5e703) [0x7f1599a1f703] 8: (()+0x5e922) [0x7f1599a1f922] 9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int, char*)+0xa5) [0x7f159ca12955] 10: (void decode<unsigned long, unsigned long>(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >&, ceph::buffer::list::iterator&)+0x2e) [0x7f159c600fae] 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x2aa4) [0x7f159c5b0314] 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x97) [0x7f159c5cdf77] 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x890) [0x7f159c5cedb0] 14: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x3552) [0x7f159c5d3732] 15: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x7f159c56d835] 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f159c45d02d] 17: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f159c45d24d] 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8a9) [0x7f159c481649] 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f159c9f5caf] 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f159c9f7bb0] 21: (()+0x8182) [0x7f159afe7182] 22: (clone()+0x6d) [0x7f15991da47d] NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. --- logging levels --- 0/ 5 none 0/ 0 lockdep 0/ 0 context 0/ 0 crush 1/ 5 mds 1/ 5 mds_balancer 1/ 5 mds_locker 1/ 5 mds_log 1/ 5 mds_log_expire 1/ 5 mds_migrator 0/ 0 buffer 0/ 0 timer 0/ 0 filer 0/ 1 striper 0/ 0 objecter 0/ 0 rados 0/ 0 rbd 0/ 5 rbd_replay 0/ 0 journaler 0/ 5 objectcacher 0/ 0 client 0/ 0 osd 0/ 0 optracker 0/ 0 objclass 0/ 0 filestore 1/ 3 keyvaluestore 0/ 0 journal 0/ 0 ms 0/ 0 mon 0/ 0 monc 0/ 0 paxos 0/ 0 tp 0/ 0 auth 1/ 5 crypto 0/ 0 finisher 0/ 0 heartbeatmap 0/ 0 perfcounter 0/ 0 rgw 1/10 civetweb 1/ 5 javaclient 0/ 0 asok 0/ 0 throttle 0/ 0 refs 1/ 5 xio 1/ 5 compressor 1/ 5 newstore -2/-2 (syslog threshold) -1/-1 (stderr threshold) max_recent 10000 max_new 1000 log_file /var/log/ceph/ceph-osd.0.log --- end dump of recent events --- FIO Script - ###################################################################### # Example test for the RBD engine. # # Runs a 4k random write test against a RBD via librbd # # NOTE: Make sure you have either a RBD named 'fio_test' or change # the rbdname parameter. ###################################################################### [global] #logging #write_iops_log=write_iops_log #write_bw_log=write_bw_log #write_lat_log=write_lat_log #ioengine=libaio ioengine=rbd clientname=admin direct=1 pool=pool1 rbdname=im1 rw=randread or randwrite bs=8k numjobs=16 time_based=1 runtime=300 ramp_time=60 [rbd_iodepth32] iodepth=128 Any pointers as to what could be the issue will be greatly appreciated. Thanks, -Vish -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html