Re: [Newstore] FIO Read's from Cient causes OSD *** Caught signal (Aborted) **

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Vish-

This is not too surprising, but I am inclined to ignore it for now: i'm in 
the midst of a major rewrite anyway to use a raw block device instead of 
the file system.

sage

On Wed, 28 Oct 2015, Vish (Vishwanath) Maram-SSI wrote:

> Hi,
> 
> We are observing a crash of OSD whenever we run FIO Read's from a client. Setup is very simple and explained as below:
> 
> 1. One OSD with Ceph Version "ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68)", pulled the code from GIT and compiled/Installed.
> 2. One Client with same version of CEPH.
> 3. FIO Version - fio-2.2.10-16-gd223
> 4. Ceph Conf as given below
> 5. Crash log details from log file as below
> 6. FIO Script as given below
> 
> CEPH Conf -
> 
> [global]
> fsid = 9eda02e2-04b7-4eed-a85a-8471ea51528d
> mon_initial_members = msl-dsma-spoc08
> mon_host = 10.10.10.190
> auth_cluster_required = cephx
> auth_service_required = cephx
> auth_client_required = cephx
> auth_supported = none
> 
> #Needed for Newstore 
> osd_objectstore = newstore
> enable experimental unrecoverable data corrupting features = newstore, rocksdb
> newstore_backend = rocksdb
> 
> #Debug - Start Removed for  now to debug
> #newstore_max_dir_size = 4096
> #newstore_sync_io = true
> #newstore_sync_transaction = true
> #newstore_sync_submit_transaction = true
> #newstore_sync_wal_apply = true
> #newstore_overlay_max = 0
> #Debug - End
> 
> #Needed for Newstore
> 
> filestore_xattr_use_omap = true
> 
> osd pool default size = 1
> rbd cache = false
> 
> 
> debug_lockdep = 0/0
> debug_context = 0/0
> debug_crush = 0/0
> debug_buffer = 0/0
> debug_timer = 0/0
> debug_filer = 0/0
> debug_objecter = 0/0
> debug_rados = 0/0
> debug_rbd = 0/0
> debug_journaler = 0/0
> debug_objectcatcher = 0/0
> debug_client = 0/0
> debug_osd = 0/0
> debug_optracker = 0/0
> debug_objclass = 0/0
> debug_filestore = 0/0
> debug_journal = 0/0
> debug_ms = 0/0
> debug_monc = 0/0
> debug_tp = 0/0
> debug_auth = 0/0
> debug_finisher = 0/0
> debug_heartbeatmap = 0/0
> debug_perfcounter = 0/0
> debug_asok = 0/0
> debug_throttle = 0/0
> debug_mon = 0/0
> debug_paxos = 0/0
> debug_rgw = 0/0
> osd_op_threads = 5
> osd_op_num_threads_per_shard = 1
> osd_op_num_shards = 25
> #osd_op_num_sharded_pool_threads = 25
> filestore_op_threads = 4
> ms_nocrc = true
> filestore_fd_cache_size = 64
> filestore_fd_cache_shards = 32
> cephx sign messages = false
> cephx require signatures = false
> ms_dispatch_throttle_bytes = 0
> throttler_perf_counter = false
> 
> [osd]
> osd_client_message_size_cap = 0
> osd_client_message_cap = 0
> osd_enable_op_tracker = false 
> 
> Crash details from the log:
>   -194> 2015-10-28 10:54:40.792957 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915510
>   -193> 2015-10-28 10:54:40.792959 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916590
>   -192> 2015-10-28 10:54:40.792962 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914990
>   -191> 2015-10-28 10:54:40.792965 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916490
>   -190> 2015-10-28 10:54:40.792968 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916090
>   -189> 2015-10-28 10:54:40.792971 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915c10
>   -188> 2015-10-28 10:54:40.792975 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba917190
>   -187> 2015-10-28 10:54:40.792977 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba916810
>   -186> 2015-10-28 10:54:40.792980 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914790
>   -185> 2015-10-28 10:54:40.792983 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915e10
>   -184> 2015-10-28 10:54:40.792986 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915f10
>   -183> 2015-10-28 10:54:40.792988 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba915f90
>  -182> 2015-10-28 10:54:40.792992 7f15862e8700  2 newstore(/var/lib/ceph/osd/ceph-0) _do_wal_transaction prepared aio 0x7f15ba914510
>  ...
>  
>    -10> 2015-10-28 10:55:45.240480 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -9> 2015-10-28 10:55:45.240830 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -8> 2015-10-28 10:55:45.241135 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -7> 2015-10-28 10:55:45.241418 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -6> 2015-10-28 10:55:45.241674 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -5> 2015-10-28 10:55:45.241913 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -4> 2015-10-28 10:55:45.242150 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -3> 2015-10-28 10:55:45.242391 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -2> 2015-10-28 10:55:45.242614 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>     -1> 2015-10-28 10:55:45.242885 7f1577366700  5 newstore(/var/lib/ceph/osd/ceph-0) queue_transactions existing 0x7f15a0ac1180 osr(1.b1 0x7f15a025acf0)
>      0> 2015-10-28 10:55:54.323685 7f156e354700 -1 *** Caught signal (Aborted) **
> in thread 7f156e354700
> 
> ceph version 9.1.0-420-ge3921a8 (e3921a8396870be4a38ce1f1b6c35bc0829dbb68)
> 1: (()+0x80b70a) [0x7f159c91670a]
> 2: (()+0x10340) [0x7f159afef340]
> 3: (gsignal()+0x39) [0x7f1599116cc9]
> 4: (abort()+0x148) [0x7f159911a0d8]
> 5: (__gnu_cxx::__verbose_terminate_handler()+0x155) [0x7f1599a21535]
> 6: (()+0x5e6d6) [0x7f1599a1f6d6]
> 7: (()+0x5e703) [0x7f1599a1f703]
> 8: (()+0x5e922) [0x7f1599a1f922]
> 9: (ceph::buffer::list::iterator_impl<false>::copy(unsigned int, char*)+0xa5) [0x7f159ca12955]
> 10: (void decode<unsigned long, unsigned long>(std::map<unsigned long, unsigned long, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, unsigned long> > >&, ceph::buffer::list::iterator&)+0x2e) [0x7f159c600fae]
> 11: (ReplicatedPG::do_osd_ops(ReplicatedPG::OpContext*, std::vector<OSDOp, std::allocator<OSDOp> >&)+0x2aa4) [0x7f159c5b0314]
> 12: (ReplicatedPG::prepare_transaction(ReplicatedPG::OpContext*)+0x97) [0x7f159c5cdf77]
> 13: (ReplicatedPG::execute_ctx(ReplicatedPG::OpContext*)+0x890) [0x7f159c5cedb0]
> 14: (ReplicatedPG::do_op(std::shared_ptr<OpRequest>&)+0x3552) [0x7f159c5d3732]
> 15: (ReplicatedPG::do_request(std::shared_ptr<OpRequest>&, ThreadPool::TPHandle&)+0x705) [0x7f159c56d835]
> 16: (OSD::dequeue_op(boost::intrusive_ptr<PG>, std::shared_ptr<OpRequest>, ThreadPool::TPHandle&)+0x3bd) [0x7f159c45d02d]
> 17: (PGQueueable::RunVis::operator()(std::shared_ptr<OpRequest>&)+0x5d) [0x7f159c45d24d]
> 18: (OSD::ShardedOpWQ::_process(unsigned int, ceph::heartbeat_handle_d*)+0x8a9) [0x7f159c481649]
> 19: (ShardedThreadPool::shardedthreadpool_worker(unsigned int)+0x85f) [0x7f159c9f5caf]
> 20: (ShardedThreadPool::WorkThreadSharded::entry()+0x10) [0x7f159c9f7bb0]
> 21: (()+0x8182) [0x7f159afe7182]
> 22: (clone()+0x6d) [0x7f15991da47d]
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to interpret this.
> 
> --- logging levels ---
>    0/ 5 none
>    0/ 0 lockdep
>    0/ 0 context
>    0/ 0 crush
>    1/ 5 mds
>    1/ 5 mds_balancer
>    1/ 5 mds_locker
>    1/ 5 mds_log
>    1/ 5 mds_log_expire
>    1/ 5 mds_migrator
>    0/ 0 buffer
>    0/ 0 timer
>    0/ 0 filer
>    0/ 1 striper
>    0/ 0 objecter
>    0/ 0 rados
>    0/ 0 rbd
>    0/ 5 rbd_replay
>    0/ 0 journaler
>    0/ 5 objectcacher
>    0/ 0 client
>    0/ 0 osd
>    0/ 0 optracker
>    0/ 0 objclass
>    0/ 0 filestore
>    1/ 3 keyvaluestore
>    0/ 0 journal
>    0/ 0 ms
>    0/ 0 mon
>    0/ 0 monc
>    0/ 0 paxos
>    0/ 0 tp
>    0/ 0 auth
>    1/ 5 crypto
>    0/ 0 finisher
>    0/ 0 heartbeatmap
>    0/ 0 perfcounter
>    0/ 0 rgw
>    1/10 civetweb
>    1/ 5 javaclient
>    0/ 0 asok
>    0/ 0 throttle
>    0/ 0 refs
>    1/ 5 xio
>    1/ 5 compressor
>    1/ 5 newstore
>   -2/-2 (syslog threshold)
>   -1/-1 (stderr threshold)
>   max_recent     10000
>   max_new         1000
>   log_file /var/log/ceph/ceph-osd.0.log
> --- end dump of recent events ---
> 
> FIO Script - 
> 
> ######################################################################
> # Example test for the RBD engine.
> # 
> # Runs a 4k random write test against a RBD via librbd
> #
> # NOTE: Make sure you have either a RBD named 'fio_test' or change
> #       the rbdname parameter.
> ######################################################################
> [global]
> #logging
> #write_iops_log=write_iops_log
> #write_bw_log=write_bw_log
> #write_lat_log=write_lat_log
> #ioengine=libaio
> ioengine=rbd
> clientname=admin
> direct=1
> pool=pool1
> rbdname=im1
> rw=randread or randwrite
> bs=8k
> numjobs=16
> time_based=1
> runtime=300
> ramp_time=60
> 
> [rbd_iodepth32]
> iodepth=128
> 
> Any pointers as to what could be the issue will be greatly appreciated.
> 
> Thanks,
> -Vish
> --
> To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
> 
> 

[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux