Re: emperor -> firefly 0.80.7 upgrade problem

Gregory Farnum <greg@xxxxxxxxxxx> · Mon, 3 Nov 2014 09:56:07 -0800

On Mon, Nov 3, 2014 at 7:46 AM, Chad Seys <cwseys@xxxxxxxxxxxxxxxx> wrote:
> Hi All,
>    I upgraded from emperor to firefly.  Initial upgrade went smoothly and all
> placement groups were active+clean .
>   Next I executed
> 'ceph osd crush tunables optimal'
>   to upgrade CRUSH mapping.

Okay...you know that's a data movement command, right? So you should
expect it to impact operations. (Although not the crashes you're
witnessing.)

>   Now I keep having OSDs go down or have requests blocked for long periods of
> time.
>   I start back up the down OSDs and recovery eventually stops, but with 100s
> of "incomplete" and "down+incomplete" pgs remaining.
>   The ceph web page says "If you see this state [incomplete], report a bug,
> and try to start any failed OSDs that may contain the needed information."
> Well, all the OSDs are up, though some have blocked requests.
>
> Also, the logs of the OSDs which go down have this message:
> 2014-11-02 21:46:33.615829 7ffcf0421700  0 -- 192.168.164.192:6810/31314 >>
> 192.168.164.186:6804/20934 pipe(0x2faa0280 sd=261 :6810 s=2 pgs=9
> 19 cs=25 l=0 c=0x2ed022c0).fault with nothing to send, going to standby
> 2014-11-02 21:49:11.440142 7ffce4cf3700  0 -- 192.168.164.192:6810/31314 >>
> 192.168.164.186:6804/20934 pipe(0xe512a00 sd=249 :6810 s=0 pgs=0
> cs=0 l=0 c=0x2a308b00).accept connect_seq 26 vs existing 25 state standby
> 2014-11-02 21:51:20.085676 7ffcf6e3e700 -1 osd/PG.cc: In function
> 'PG::RecoveryState::Crashed::Crashed(boost::statechart::state<PG::RecoveryS
> tate::Crashed, PG::RecoveryState::RecoveryMachine>::my_context)' thread
> 7ffcf6e3e700 time 2014-11-02 21:51:20.052242
> osd/PG.cc: 5424: FAILED assert(0 == "we got a bad state machine event")

These failures are usually the result of adjusting tunables without
having upgraded all the machines in the cluster — although they should
also be fixed in v0.80.7. Are you still seeing crashes, or just the PG
state issues?
-Greg
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com