Re: OSD Crashes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I can see that. I guess what would that be symptomatic of? How is it doing that on 6 different systems and on multiple OSDs?

-----Original Message-----
From: Samuel Just [mailto:sjust@xxxxxxxxxx] 
Sent: Friday, April 29, 2016 8:57 AM
To: Garg, Pankaj
Cc: ceph-users@xxxxxxxxxxxxxx
Subject: Re:  OSD Crashes

Your fs is throwing an EIO on open.
-Sam

On Fri, Apr 29, 2016 at 8:54 AM, Garg, Pankaj <Pankaj.Garg@xxxxxxxxxxxxxxxxxx> wrote:
> Hi,
>
> I had a fully functional Ceph cluster with 3 x86 Nodes and 3 ARM64 
> nodes, each with 12 HDD Drives and 2SSD Drives. All these were 
> initially running Hammer, and then were successfully updated to Infernalis (9.2.0).
>
> I recently deleted all my OSDs and swapped my drives with new ones on 
> the
> x86 Systems, and the ARM servers were swapped with different ones 
> (keeping drives same).
>
> I again provisioned the OSDs, keeping the same cluster and Ceph 
> versions as before. But now, every time I try to run RADOS bench, my 
> OSDs start crashing (on both ARM and x86 servers).
>
> I’m not sure why this is happening on all 6 systems. On the x86, it’s 
> the same Ceph bits as before, and the only thing different is the new drives.
>
> It’s the same stack (pasted below) on all the OSDs too.
>
> Can anyone provide any clues?
>
>
>
> Thanks
>
> Pankaj
>
>
>
>
>
>
>
>
>
>
>
>   -14> 2016-04-28 08:09:45.423950 7f1ef05b1700  1 --
> 192.168.240.117:6820/14377 <== osd.93 192.168.240.116:6811/47080 1236 
> ====
> osd_repop(client.2794263.0:37721 284.6d4 
> 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 
> 12284'26) v1 ==== 981+0+4759 (3923326827 0 3705383247) 0x5634cbabc400 
> con 0x5634c5168420
>
>    -13> 2016-04-28 08:09:45.423981 7f1ef05b1700  5 -- op tracker -- seq:
> 29404, time: 2016-04-28 08:09:45.423882, event: header_read, op:
> osd_repop(client.2794263.0:37721 284.6d4 
> 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 
> 12284'26)
>
>    -12> 2016-04-28 08:09:45.423991 7f1ef05b1700  5 -- op tracker -- seq:
> 29404, time: 2016-04-28 08:09:45.423884, event: throttled, op:
> osd_repop(client.2794263.0:37721 284.6d4 
> 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 
> 12284'26)
>
>    -11> 2016-04-28 08:09:45.423996 7f1ef05b1700  5 -- op tracker -- seq:
> 29404, time: 2016-04-28 08:09:45.423942, event: all_read, op:
> osd_repop(client.2794263.0:37721 284.6d4 
> 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 
> 12284'26)
>
>    -10> 2016-04-28 08:09:45.424001 7f1ef05b1700  5 -- op tracker -- seq:
> 29404, time: 0.000000, event: dispatched, op:
> osd_repop(client.2794263.0:37721 284.6d4 
> 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 
> 12284'26)
>
>     -9> 2016-04-28 08:09:45.424014 7f1ef05b1700  5 -- op tracker -- seq:
> 29404, time: 2016-04-28 08:09:45.424014, event: queued_for_pg, op:
> osd_repop(client.2794263.0:37721 284.6d4 
> 284/afa8fed4/benchmark_data_x86Ceph1_147212_object37720/head v 
> 12284'26)
>
>     -8> 2016-04-28 08:09:45.561827 7f1f15799700  5 osd.102 12284 
> tick_without_osd_lock
>
>     -7> 2016-04-28 08:09:45.973944 7f1f0801a700  1 --
> 192.168.240.117:6821/14377 <== osd.73 192.168.240.115:0/26572 1306 
> ==== osd_ping(ping e12284 stamp 2016-04-28 08:09:45.971751) v2 ==== 
> 47+0+0
> (846632602 0 0) 0x5634c8305c00 con 0x5634c58dd760
>
>     -6> 2016-04-28 08:09:45.973995 7f1f0801a700  1 --
> 192.168.240.117:6821/14377 --> 192.168.240.115:0/26572 -- 
> osd_ping(ping_reply e12284 stamp 2016-04-28 08:09:45.971751) v2 -- ?+0
> 0x5634c7ba8000 con 0x5634c58dd760
>
>     -5> 2016-04-28 08:09:45.974300 7f1f0981d700  1 --
> 10.18.240.117:6821/14377 <== osd.73 192.168.240.115:0/26572 1306 ==== 
> osd_ping(ping e12284 stamp 2016-04-28 08:09:45.971751) v2 ==== 47+0+0
> (846632602 0 0) 0x5634c8129400 con 0x5634c58dcf20
>
>     -4> 2016-04-28 08:09:45.974337 7f1f0981d700  1 --
> 10.18.240.117:6821/14377 --> 192.168.240.115:0/26572 -- 
> osd_ping(ping_reply
> e12284 stamp 2016-04-28 08:09:45.971751) v2 -- ?+0 0x5634c617d600 con
> 0x5634c58dcf20
>
>     -3> 2016-04-28 08:09:46.174079 7f1f11f92700  0
> filestore(/var/lib/ceph/osd/ceph-102) write couldn't open
> 287.6f9_head/287/ae33fef9/benchmark_data_ceph7_17591_object39895/head: 
> (117) Structure needs cleaning
>
>     -2> 2016-04-28 08:09:46.174103 7f1f11f92700  0
> filestore(/var/lib/ceph/osd/ceph-102)  error (117) Structure needs 
> cleaning not handled on operation 0x5634c885df9e (16590.1.0, or op 0, 
> counting from
> 0)
>
>     -1> 2016-04-28 08:09:46.174109 7f1f11f92700  0
> filestore(/var/lib/ceph/osd/ceph-102) unexpected error code
>
>      0> 2016-04-28 08:09:46.178707 7f1f11791700 -1 os/FileStore.cc: In 
> function 'int FileStore::lfn_open(coll_t, const ghobject_t&, bool, 
> FDRef*, Index*)' thread 7f1f11791700 time 2016-04-28 08:09:46.173250
>
> os/FileStore.cc: 335: FAILED assert(!m_filestore_fail_eio || r != -5)
>
>
>
> ceph version 9.2.1 (752b6a3020c3de74e07d2a8b4c5e48dab5a6b6fd)
>
> 1: (ceph::__ceph_assert_fail(char const*, char const*, int, char
> const*)+0x8b) [0x5634c02ec7eb]
>
> 2: (FileStore::lfn_open(coll_t, ghobject_t const&, bool, 
> std::shared_ptr<FDCache::FD>*, Index*)+0x1191) [0x5634bffb2d01]
>
> 3: (FileStore::_write(coll_t, ghobject_t const&, unsigned long, 
> unsigned long, ceph::buffer::list const&, unsigned int)+0xf0) 
> [0x5634bffbb7b0]
>
> 4: (FileStore::_do_transaction(ObjectStore::Transaction&, unsigned 
> long, int, ThreadPool::TPHandle*)+0x2901) [0x5634bffc6f51]
>
> 5: (FileStore::_do_transactions(std::list<ObjectStore::Transaction*,
> std::allocator<ObjectStore::Transaction*> >&, unsigned long,
> ThreadPool::TPHandle*)+0x64) [0x5634bffcc404]
>
> 6: (FileStore::_do_op(FileStore::OpSequencer*, 
> ThreadPool::TPHandle&)+0x1a9) [0x5634bffcc5c9]
>
> 7: (ThreadPool::worker(ThreadPool::WorkThread*)+0xa5e) 
> [0x5634c02de10e]
>
> 8: (ThreadPool::WorkThread::entry()+0x10) [0x5634c02defd0]
>
> 9: (()+0x8182) [0x7f1f1f91a182]
>
> 10: (clone()+0x6d) [0x7f1f1dc6147d]
>
> NOTE: a copy of the executable, or `objdump -rdS <executable>` is 
> needed to interpret this.
>
>
>
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux