Hi Guys,
Any additional thoughts on this? There was a bit of information shared off-list I wanted to bring back:
Sam mentioned that the metadata looked odd, and suspected "some form of 32bit shenanigans in the key name construction".Any additional thoughts on this? There was a bit of information shared off-list I wanted to bring back:
"Hmm. Based on the omap and logs, the omap directory is simply a bunch
of updates behind. Was the node rebooted as part of the osd restart?
FS is xfs? What are your fs mount options?"
From ceph.conf:
osd mount options xfs = "rw,noatime,inode64,logbufs=8,logbsize=256k"
On Tue, Apr 30, 2013 at 12:17 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
On the MON node:On the OSD node:root@cepha0:~# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.10
Release: 12.10
Codename: quantal
root@cepha0:~# dpkg -l "*leveldb*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-========================-========================-==================================================================================
ii libleveldb1:armhf 0+20120530.gitdd0d562-2 armhf fast key-value storage library
root@cepha0:~# uname -a
Linux cepha0 3.5.0-27-highbank #46-Ubuntu SMP Mon Mar 25 23:19:40 UTC 2013 armv7l armv7l armv7l GNU/Linux
# lsb_release -a
No LSB modules are available.
Distributor ID: Ubuntu
Description: Ubuntu 12.10
Release: 12.10
Codename: quantal
# uname -a
Linux 3.5.0-27-generic #46-Ubuntu SMP Mon Mar 25 19:58:17 UTC 2013 x86_64 x86_64 x86_64 GNU/Linux
# dpkg -l "*leveldb*"
Desired=Unknown/Install/Remove/Purge/Hold
| Status=Not/Inst/Conf-files/Unpacked/halF-conf/Half-inst/trig-aWait/Trig-pend
|/ Err?=(none)/Reinst-required (Status,Err: uppercase=bad)
||/ Name Version Architecture Description
+++-======================================-========================-========================-==================================================================================
un leveldb-doc <none> (no description available)
ii libleveldb-dev:amd64 0+20120530.gitdd0d562-2 amd64 fast key-value storage library (development files)
ii libleveldb1:amd64 0+20120530.gitdd0d562-2 amd64 fast key-value storage libraryOn Tue, Apr 30, 2013 at 12:11 PM, Samuel Just <sam.just@xxxxxxxxxxx> wrote:
What version of leveldb is installed? Ubuntu/version?
-Sam
On Tue, Apr 30, 2013 at 8:50 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
> Interestingly, the down OSD does not get marked out after 5 minutes.
> Probably that is already fixed by http://tracker.ceph.com/issues/4822.
>
>
> On Tue, Apr 30, 2013 at 11:42 AM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
>>
>> Hi Sam,
>>
>> I was prepared to write in and say that the problem had gone away. I
>> tried restarting several OSDs last night in the hopes of capturing the
>> problem on and OSD that hadn't failed yet, but didn't have any luck. So I
>> did indeed re-create the cluster from scratch (using mkcephfs), and what do
>> you know -- everything worked. I got everything in a nice stable state,
>> then decided to do a full cluster restart, just to be sure. Sure enough,
>> one OSD failed to come up, and has the same stack trace. So I believe I
>> have the log you want -- just from the OSD that failed, right?
>>
>> Question -- any feeling for what parts of the log you need? It's 688MB
>> uncompressed (two hours!), so I'd like to be able to trim some off for you
>> before making it available. Do you only need/want the part from after the
>> OSD was restarted? Or perhaps the corruption happens on OSD shutdown and
>> you need some before that? If you are fine with that large of a file, I can
>> just make that available too. Let me know.
>>
>> - Travis
>>
>>
>> On Mon, Apr 29, 2013 at 6:26 PM, Travis Rhoden <trhoden@xxxxxxxxx> wrote:
>>>
>>> Hi Sam,
>>>
>>> No problem, I'll leave that debugging turned up high, and do a mkcephfs
>>> from scratch and see what happens. Not sure if it will happen again or not.
>>> =)
>>>
>>> Thanks again.
>>>
>>> - Travis
>>>
>>>
>>> On Mon, Apr 29, 2013 at 5:51 PM, Samuel Just <sam.just@xxxxxxxxxxx>
>>> wrote:
>>>>
>>>> Hmm, I need logging from when the corruption happened. If this is
>>>> reproducible, can you enable that logging on a clean osd (or better, a
>>>> clean cluster) until the assert occurs?
>>>> -Sam
>>>>
>>>> On Mon, Apr 29, 2013 at 2:45 PM, Travis Rhoden <trhoden@xxxxxxxxx>
>>>> wrote:
>>>> > Also, I can note that it does not take a full cluster restart to
>>>> > trigger
>>>> > this. If I just restart an OSD that was up/in previously, the same
>>>> > error
>>>> > can happen (though not every time). So restarting OSD's for me is a
>>>> > bit
>>>> > like Russian roullette. =) Even though restarting an OSD may not
>>>> > also
>>>> > result in the error, it seems that once it happens that OSD is gone
>>>> > for
>>>> > good. No amount of restart has brought any of the dead ones back.
>>>> >
>>>> > I'd really like to get to the bottom of it. Let me know if I can do
>>>> > anything to help.
>>>> >
>>>> > I may also have to try completely wiping/rebuilding to see if I can
>>>> > make
>>>> > this thing usable.
>>>> >
>>>> >
>>>> > On Mon, Apr 29, 2013 at 2:38 PM, Travis Rhoden <trhoden@xxxxxxxxx>
>>>> > wrote:
>>>> >>
>>>> >> Hi Sam,
>>>> >>
>>>> >> Thanks for being willing to take a look.
>>>> >>
>>>> >> I applied the debug settings on one host that 3 out of 3 OSDs with
>>>> >> this
>>>> >> problem. Then tried to start them up. Here are the resulting logs:
>>>> >>
>>>> >> https://dl.dropboxusercontent.com/u/23122069/cephlogs.tgz
>>>> >>
>>>> >> - Travis
>>>> >>
>>>> >>
>>>> >> On Mon, Apr 29, 2013 at 1:04 PM, Samuel Just <sam.just@xxxxxxxxxxx>
>>>> >> wrote:
>>>> >>>
>>>> >>> You appear to be missing pg metadata for some reason. If you can
>>>> >>> reproduce it with
>>>> >>> debug osd = 20
>>>> >>> debug filestore = 20
>>>> >>> debug ms = 1
>>>> >>> on all of the OSDs, I should be able to track it down.
>>>> >>>
>>>> >>> I created a bug: #4855.
>>>> >>>
>>>> >>> Thanks!
>>>> >>> -Sam
>>>> >>>
>>>> >>> On Mon, Apr 29, 2013 at 9:52 AM, Travis Rhoden <trhoden@xxxxxxxxx>
>>>> >>> wrote:
>>>> >>> > Thanks Greg.
>>>> >>> >
>>>> >>> > I quit playing with it because every time I restarted the cluster
>>>> >>> > (service
>>>> >>> > ceph -a restart), I lost more OSDs.. First time it was 1, 2nd 10,
>>>> >>> > 3rd
>>>> >>> > time
>>>> >>> > 13... All 13 down OSDs all show the same stacktrace.
>>>> >>> >
>>>> >>> > - Travis
>>>> >>> >
>>>> >>> >
>>>> >>> > On Mon, Apr 29, 2013 at 11:56 AM, Gregory Farnum
>>>> >>> > <greg@xxxxxxxxxxx>
>>>> >>> > wrote:
>>>> >>> >>
>>>> >>> >> This sounds vaguely familiar to me, and I see
>>>> >>> >> http://tracker.ceph.com/issues/4052, which is marked as "Can't
>>>> >>> >> reproduce" — I think maybe this is fixed in "next" and "master",
>>>> >>> >> but
>>>> >>> >> I'm not sure. For more than that I'd have to defer to Sage or
>>>> >>> >> Sam.
>>>> >>> >> -Greg
>>>> >>> >> Software Engineer #42 @ http://inktank.com | http://ceph.com
>>>> >>> >>
>>>> >>> >>
>>>> >>> >> On Sat, Apr 27, 2013 at 6:43 PM, Travis Rhoden
>>>> >>> >> <trhoden@xxxxxxxxx>
>>>> >>> >> wrote:
>>>> >>> >> > Hey folks,
>>>> >>> >> >
>>>> >>> >> > I'm helping put together a new test/experimental cluster, and
>>>> >>> >> > hit
>>>> >>> >> > this
>>>> >>> >> > today
>>>> >>> >> > when bringing the cluster up for the first time (using
>>>> >>> >> > mkcephfs).
>>>> >>> >> >
>>>> >>> >> > After doing the normal "service ceph -a start", I noticed one
>>>> >>> >> > OSD
>>>> >>> >> > was
>>>> >>> >> > down,
>>>> >>> >> > and a lot of PGs were stuck creating. I tried restarting the
>>>> >>> >> > down
>>>> >>> >> > OSD,
>>>> >>> >> > but
>>>> >>> >> > it would come up. It always had this error:
>>>> >>> >> >
>>>> >>> >> > -1> 2013-04-27 18:11:56.179804 b6fcd000 2 osd.1 0 boot
>>>> >>> >> > 0> 2013-04-27 18:11:56.402161 b6fcd000 -1 osd/PG.cc: In
>>>> >>> >> > function
>>>> >>> >> > 'static epoch_t PG::peek_map_epoch(ObjectStore*, coll_t,
>>>> >>> >> > hobject_t&,
>>>> >>> >> > ceph::bufferlist*)' thread b6fcd000 time 2013-04-27
>>>> >>> >> > 18:11:56.399089
>>>> >>> >> > osd/PG.cc: 2556: FAILED assert(values.size() == 1)
>>>> >>> >> >
>>>> >>> >> > ceph version 0.60-401-g17a3859
>>>> >>> >> > (17a38593d60f5f29b9b66c13c0aaa759762c6d04)
>>>> >>> >> > 1: (PG::peek_map_epoch(ObjectStore*, coll_t, hobject_t&,
>>>> >>> >> > ceph::buffer::list*)+0x1ad) [0x2c3c0a]
>>>> >>> >> > 2: (OSD::load_pgs()+0x357) [0x28cba0]
>>>> >>> >> > 3: (OSD::init()+0x741) [0x290a16]
>>>> >>> >> > 4: (main()+0x1427) [0x2155c0]
>>>> >>> >> > 5: (__libc_start_main()+0x99) [0xb69bcf42]
>>>> >>> >> > NOTE: a copy of the executable, or `objdump -rdS <executable>`
>>>> >>> >> > is
>>>> >>> >> > needed to
>>>> >>> >> > interpret this.
>>>> >>> >> >
>>>> >>> >> >
>>>> >>> >> > I then did a full cluster restart, and now I have ten OSDs down
>>>> >>> >> > --
>>>> >>> >> > each
>>>> >>> >> > showing the same exception/failed assert.
>>>> >>> >> >
>>>> >>> >> > Anybody seen this?
>>>> >>> >> >
>>>> >>> >> > I know I'm running a weird version -- it's compiled from
>>>> >>> >> > source, and
>>>> >>> >> > was
>>>> >>> >> > provided to me. The OSDs are all on ARM, and the mon is
>>>> >>> >> > x86_64.
>>>> >>> >> > Just
>>>> >>> >> > looking to see if anyone has seen this particular stack trace
>>>> >>> >> > of
>>>> >>> >> > load_pgs()/peek_map_epoch() before....
>>>> >>> >> >
>>>> >>> >> > - Travis
>>>> >>> >> >
>>>> >>> >> > _______________________________________________
>>>> >>> >> > ceph-users mailing list
>>>> >>> >> > ceph-users@xxxxxxxxxxxxxx
>>>> >>> >> > http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>> >>> >> >
>>>> >>> >
>>>> >>> >
>>>> >>
>>>> >>
>>>> >
>>>
>>>
>>
>
_______________________________________________ ceph-users mailing list ceph-users@xxxxxxxxxxxxxx http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com