Re: Failed assert when starting new OSDs in 0.60

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Greg.

I quit playing with it because every time I restarted the cluster (service ceph -a restart), I lost more OSDs..  First time it was 1, 2nd 10, 3rd time 13...  All 13 down OSDs all show the same stacktrace.

 - Travis


On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum <greg@xxxxxxxxxxx> wrote:
This sounds vaguely familiar to me, and I see
http://tracker.ceph.com/issues/4052, which is marked as "Can't
reproduce" — I think maybe this is fixed in "next" and "master", but
I'm not sure. For more than that I'd have to defer to Sage or Sam.
-Greg
Software Engineer #42 @ http://inktank.com | http://ceph.com


On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
> Hey folks,
>
> I'm helping put together a new test/experimental cluster, and hit this today
> when bringing the cluster up for the first time (using mkcephfs).
>
> After doing the normal "service ceph -a start", I noticed one OSD was down,
> and a lot of PGs were stuck creating.  I tried restarting the down OSD, but
> it would come up.  It always had this error:
>
>     -1> 2013-04-27 18:11:56.179804 b6fcd000  2 osd.1 0 boot
>      0> 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In function
> 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
> ceph::bufferlist*)' thread b6fcd000 time 2013-04-27 18:11:56.399089
> osd/PG.cc: 2556: FAILED assert(values.size() == 1)
>
>  ceph version 0.60-401-g17a3859 (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
>  1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
> ceph::buffer::list*)+0x1ad) [0x2c3c0a]
>  2: (OSD::load_pgs()+0x357) [0x28cba0]
>  3: (OSD::init()+0x741) [0x290a16]
>  4: (main()+0x1427) [0x2155c0]
>  5: (__libc_start_main()+0x99) [0xb69bcf42]
>  NOTE: a copy of the executable, or `objdump -rdS <executable>` is needed to
> interpret this.
>
>
> I then did a full cluster restart, and now I have ten OSDs down -- each
> showing the same exception/failed assert.
>
> Anybody seen this?
>
> I know I'm running a weird version -- it's compiled from source, and was
> provided to me.  The OSDs are all on ARM, and the mon is x86_64.  Just
> looking to see if anyone has seen this particular stack trace of
> load_pgs()/peek_map_epoch() before....
>
>  - Travis
>
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>

_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com

[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux