Hi Vish- This is not too surprising, but I am inclined to ignore it for now: i'm in the midst of a major rewrite anyway to use a raw block device instead of the file system. sage On Wed, 28 Oct 2015, Vish (Vishwanath) Maram-SSI wrote: > Hi, > > We are observing a crash of OSD whenever we run FIO Read's from a client. Setup is very simple and explained as below: > > 1. One OSD with Ceph Version "ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68)", pulled the code from GIT and compiled/Installed. > 2. One Client with same version of CEPH. > 3. FIO Version - fio-2.2.10-16-gd223 > 4. Ceph Conf as given below > 5. Crash log details from log file as below > 6. FIO Script as given below > > CEPH Conf - > > [global] > fsid = 9eda02e2-04b7-4eed-a85a-8471ea51528d > mon_initial_members = msl-dsma-spoc08 > mon_host = 10.10.10.190 > auth_cluster_required = cephx > auth_service_required = cephx > auth_client_required = cephx > auth_supported = none > > #Needed for Newstore > osd_objectstore = newstore > enable experimental unrecoverable data corrupting features = newstore, rocksdb > newstore_backend = rocksdb > > #Debug - Start Removed for now to debug > #newstore_max_dir_size = 4096 > #newstore_sync_io = true > #newstore_sync_transaction = true > #newstore_sync_submit_transaction = true > #newstore_sync_wal_apply = true > #newstore_overlay_max = 0 > #Debug - End > > #Needed for Newstore > > filestore_xattr_use_omap = true > > osd pool default size = 1 > rbd cache = false > > > debug_lockdep = 0/0 > debug_context = 0/0 > debug_crush = 0/0 > debug_buffer = 0/0 > debug_timer = 0/0 > debug_filer = 0/0 > debug_objecter = 0/0 > debug_rados = 0/0 > debug_rbd = 0/0 > debug_journaler = 0/0 > debug_objectcatcher = 0/0 > debug_client = 0/0 > debug_osd = 0/0 > debug_optracker = 0/0 > debug_objclass = 0/0 > debug_filestore = 0/0 > debug_journal = 0/0 > debug_ms = 0/0 > debug_monc = 0/0 > debug_tp = 0/0 > debug_auth = 0/0 > debug_finisher = 0/0 > debug_heartbeatmap = 0/0 > debug_perfcounter = 0/0 > debug_asok = 0/0 > debug_throttle = 0/0 > debug_mon = 0/0 > debug_paxos = 0/0 > debug_rgw = 0/0 > osd_op_threads = 5 > osd_op_num_threads_per_shard = 1 > osd_op_num_shards = 25 > #osd_op_num_sharded_pool_threads = 25 > filestore_op_threads = 4 > ms_nocrc = true > filestore_fd_cache_size = 64 > filestore_fd_cache_shards = 32 > cephx sign messages = false > cephx require signatures = false > ms_dispatch_throttle_bytes = 0 > throttler_perf_counter = false > > [osd] > osd_client_message_size_cap = 0 > osd_client_message_cap = 0 > osd_enable_op_tracker = false > > Crash details from the log: > -194> 2015-10-28 10:54:40.792957 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915510 > -193> 2015-10-28 10:54:40.792959 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916590 > -192> 2015-10-28 10:54:40.792962 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914990 > -191> 2015-10-28 10:54:40.792965 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916490 > -190> 2015-10-28 10:54:40.792968 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916090 > -189> 2015-10-28 10:54:40.792971 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915c10 > -188> 2015-10-28 10:54:40.792975 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba917190 > -187> 2015-10-28 10:54:40.792977 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916810 > -186> 2015-10-28 10:54:40.792980 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914790 > -185> 2015-10-28 10:54:40.792983 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915e10 > -184> 2015-10-28 10:54:40.792986 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915f10 > -183> 2015-10-28 10:54:40.792988 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915f90 > -182> 2015-10-28 10:54:40.792992 7f15862e8700 2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914510 > ... > > -10> 2015-10-28 10:55:45.240480 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -9> 2015-10-28 10:55:45.240830 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -8> 2015-10-28 10:55:45.241135 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -7> 2015-10-28 10:55:45.241418 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -6> 2015-10-28 10:55:45.241674 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -5> 2015-10-28 10:55:45.241913 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -4> 2015-10-28 10:55:45.242150 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -3> 2015-10-28 10:55:45.242391 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -2> 2015-10-28 10:55:45.242614 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > -1> 2015-10-28 10:55:45.242885 7f1577366700 5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0) > 0> 2015-10-28 10:55:54.323685 7f156e354700 -1 *** Caught signal (Aborted) ** > in thread 7f156e354700 > > ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68) > 1: (()+0x80b70a) [0x7f159c91670a] > 2: (()+0x10340) [0x7f159afef340] > 3: (gsignal()+0x39) [0x7f1599116cc9] > 4: (abort()+0x148) [0x7f159911a0d8] > 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1599a21535] > 6: (()+0x5e6d6) [0x7f1599a1f6d6] > 7: (()+0x5e703) [0x7f1599a1f703] > 8: (()+0x5e922) [0x7f1599a1f922] > 9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int, char*)+0xa5) [0x7f159ca12955] > 10: (void decode<unsigned long, unsigned long>(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >&, ceph::buffer::list::iterator&)+0x2e) [0x7f159c600fae] > 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x2aa4) [0x7f159c5b0314] > 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x97) [0x7f159c5cdf77] > 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x890) [0x7f159c5cedb0] > 14: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x3552) [0x7f159c5d3732] > 15: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x7f159c56d835] > 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f159c45d02d] > 17: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f159c45d24d] > 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8a9) [0x7f159c481649] > 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f159c9f5caf] > 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f159c9f7bb0] > 21: (()+0x8182) [0x7f159afe7182] > 22: (clone()+0x6d) [0x7f15991da47d] > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this. > > --- logging levels --- > 0/ 5 none > 0/ 0 lockdep > 0/ 0 context > 0/ 0 crush > 1/ 5 mds > 1/ 5 mds_balancer > 1/ 5 mds_locker > 1/ 5 mds_log > 1/ 5 mds_log_expire > 1/ 5 mds_migrator > 0/ 0 buffer > 0/ 0 timer > 0/ 0 filer > 0/ 1 striper > 0/ 0 objecter > 0/ 0 rados > 0/ 0 rbd > 0/ 5 rbd_replay > 0/ 0 journaler > 0/ 5 objectcacher > 0/ 0 client > 0/ 0 osd > 0/ 0 optracker > 0/ 0 objclass > 0/ 0 filestore > 1/ 3 keyvaluestore > 0/ 0 journal > 0/ 0 ms > 0/ 0 mon > 0/ 0 monc > 0/ 0 paxos > 0/ 0 tp > 0/ 0 auth > 1/ 5 crypto > 0/ 0 finisher > 0/ 0 heartbeatmap > 0/ 0 perfcounter > 0/ 0 rgw > 1/10 civetweb > 1/ 5 javaclient > 0/ 0 asok > 0/ 0 throttle > 0/ 0 refs > 1/ 5 xio > 1/ 5 compressor > 1/ 5 newstore > -2/-2 (syslog threshold) > -1/-1 (stderr threshold) > max_recent 10000 > max_new 1000 > log_file /var/log/ceph/ceph-osd.0.log > --- end dump of recent events --- > > FIO Script - > > ###################################################################### > # Example test for the RBD engine. > # > # Runs a 4k random write test against a RBD via librbd > # > # NOTE: Make sure you have either a RBD named 'fio_test' or change > # the rbdname parameter. > ###################################################################### > [global] > #logging > #write_iops_log=write_iops_log > #write_bw_log=write_bw_log > #write_lat_log=write_lat_log > #ioengine=libaio > ioengine=rbd > clientname=admin > direct=1 > pool=pool1 > rbdname=im1 > rw=randread or randwrite > bs=8k > numjobs=16 > time_based=1 > runtime=300 > ramp_time=60 > > [rbd_iodepth32] > iodepth=128 > > Any pointers as to what could be the issue will be greatly appreciated. > > Thanks, > -Vish > -- > To unsubscribe from this list: send the line "unsubscribe ceph-devel" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html > >