Your fs is throwing an EIO on open. -Sam On Fri, Apr 29, 2016 at 8:54 AM, Garg, Pankaj <Pankaj.Garg@xxxxxxxxxxxxxxxxxx> wrote: > Hi, > > I had a fully functional Ceph cluster with 3 x86 Nodes and 3 ARM64 nodes, > each with 12 HDD Drives and 2SSD Drives. All these were initially running > Hammer, and then were successfully updated to Infernalis (9.2.0). > > I recently deleted all my OSDs and swapped my drives with new ones on the > x86 Systems, and the ARM servers were swapped with different ones (keeping > drives same). > > I again provisioned the OSDs, keeping the same cluster and Ceph versions as > before. But now, every time I try to run RADOS bench, my OSDs start crashing > (on both ARM and x86 servers). > > I’m not sure why this is happening on all 6 systems. On the x86, it’s the > same Ceph bits as before, and the only thing different is the new drives. > > It’s the same stack (pasted below) on all the OSDs too. > > Can anyone provide any clues? > > > > Thanks > > Pankaj > > > > > > > > > > > > -14> 2016-04-28 08:09:45.423950 7f1ef05b1700 1 -- > 192.168.240.117:6820/14377 <== osd.93 192.168.240.116:6811/47080 1236 ==== > osd_repop(client.2794263.0:37721 284.6d4 > 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 12284'26) v1 > ==== 981+0+4759 (3923326827 0 3705383247) 0x5634cbabc400 con 0x5634c5168420 > > -13> 2016-04-28 08:09:45.423981 7f1ef05b1700 5 -- op tracker -- seq: > 29404, time: 2016-04-28 08:09:45.423882, event: header_read, op: > osd_repop(client.2794263.0:37721 284.6d4 > 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 12284'26) > > -12> 2016-04-28 08:09:45.423991 7f1ef05b1700 5 -- op tracker -- seq: > 29404, time: 2016-04-28 08:09:45.423884, event: throttled, op: > osd_repop(client.2794263.0:37721 284.6d4 > 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 12284'26) > > -11> 2016-04-28 08:09:45.423996 7f1ef05b1700 5 -- op tracker -- seq: > 29404, time: 2016-04-28 08:09:45.423942, event: all_read, op: > osd_repop(client.2794263.0:37721 284.6d4 > 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 12284'26) > > -10> 2016-04-28 08:09:45.424001 7f1ef05b1700 5 -- op tracker -- seq: > 29404, time: 0.000000, event: dispatched, op: > osd_repop(client.2794263.0:37721 284.6d4 > 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 12284'26) > > -9> 2016-04-28 08:09:45.424014 7f1ef05b1700 5 -- op tracker -- seq: > 29404, time: 2016-04-28 08:09:45.424014, event: queued_for_pg, op: > osd_repop(client.2794263.0:37721 284.6d4 > 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 12284'26) > > -8> 2016-04-28 08:09:45.561827 7f1f15799700 5 osd.102 12284 > tick_without_osd_lock > > -7> 2016-04-28 08:09:45.973944 7f1f0801a700 1 -- > 192.168.240.117:6821/14377 <== osd.73 192.168.240.115:0/26572 1306 ==== > osd_ping(ping e12284 stamp 2016-04-28 08:09:45.971751) v2 ==== 47+0+0 > (846632602 0 0) 0x5634c8305c00 con 0x5634c58dd760 > > -6> 2016-04-28 08:09:45.973995 7f1f0801a700 1 -- > 192.168.240.117:6821/14377 --> 192.168.240.115:0/26572 -- > osd_ping(ping_reply e12284 stamp 2016-04-28 08:09:45.971751) v2 -- ?+0 > 0x5634c7ba8000 con 0x5634c58dd760 > > -5> 2016-04-28 08:09:45.974300 7f1f0981d700 1 -- > 10.18.240.117:6821/14377 <== osd.73 192.168.240.115:0/26572 1306 ==== > osd_ping(ping e12284 stamp 2016-04-28 08:09:45.971751) v2 ==== 47+0+0 > (846632602 0 0) 0x5634c8129400 con 0x5634c58dcf20 > > -4> 2016-04-28 08:09:45.974337 7f1f0981d700 1 -- > 10.18.240.117:6821/14377 --> 192.168.240.115:0/26572 -- osd_ping(ping_reply > e12284 stamp 2016-04-28 08:09:45.971751) v2 -- ?+0 0x5634c617d600 con > 0x5634c58dcf20 > > -3> 2016-04-28 08:09:46.174079 7f1f11f92700 0 > filestore(/var/lib/ceph/osd/ceph-102) write couldn't open > 287.6f9_head/287/ae33fef9/benchmark_data_ceph7_17591_object39895/head: (117) > Structure needs cleaning > > -2> 2016-04-28 08:09:46.174103 7f1f11f92700 0 > filestore(/var/lib/ceph/osd/ceph-102) error (117) Structure needs cleaning > not handled on operation 0x5634c885df9e (16590.1.0, or op 0, counting from > 0) > > -1> 2016-04-28 08:09:46.174109 7f1f11f92700 0 > filestore(/var/lib/ceph/osd/ceph-102) unexpected error code > > 0> 2016-04-28 08:09:46.178707 7f1f11791700 -1 os/FileStore.cc: In > function 'int FileStore::lfn_open(coll_t, const ghobject_t&, bool, FDRef*, > Index*)' thread 7f1f11791700 time 2016-04-28 08:09:46.173250 > > os/FileStore.cc: 335: FAILED assert(!m_filestore_fail_eio || r != -5) > > > > ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd) > > 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char > const*)+0x8b) [0x5634c02ec7eb] > > 2: (FileStore::lfn_open(coll_t, ghobject_t const&, bool, > std::shared_ptr<FDCache::FD>*, Index*)+0x1191) [0x5634bffb2d01] > > 3: (FileStore::_write(coll_t, ghobject_t const&, unsigned long, unsigned > long, ceph::buffer::list const&, unsigned int)+0xf0) [0x5634bffbb7b0] > > 4: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned long, > int, ThreadPool::TPHandle*)+0x2901) [0x5634bffc6f51] > > 5: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*, > std::allocator<ObjectStore::Transaction*> >&, unsigned long, > ThreadPool::TPHandle*)+0x64) [0x5634bffcc404] > > 6: (FileStore::_do_op(FileStore::OpSequencer*, ThreadPool::TPHandle&)+0x1a9) > [0x5634bffcc5c9] > > 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) [0x5634c02de10e] > > 8: (ThreadPool::WorkThread::entry()+0x10) [0x5634c02defd0] > > 9: (()+0x8182) [0x7f1f1f91a182] > > 10: (clone()+0x6d) [0x7f1f1dc6147d] > > NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to > interpret this. > > > > > _______________________________________________ > ceph-users mailing list > ceph-users@xxxxxxxxxxxxxx > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com > _______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com