On Wed, Feb 22, 2012 at 12:25 PM, Jens Rehpöhler <jens.rehpoehler@xxxxxxxx> wrote: > Hi Gregory, > > > On 22.02.2012 18:12, Gregory Farnum wrote: >> On Feb 22, 2012, at 1:53 AM, "Jens Rehpöhler" <jens.rehpoehler@xxxxxxxx> wrote: >> >>> Some Additios: meanwhile we are at the state: >>> >>> 2012-02-22 10:38:49.587403 pg v1044553: 2046 pgs: 2036 active+clean, >>> 10 active+clean+inconsistent; 2110 GB data, 4061 GB used, 25732 GB / >>> 29794 GB avail >>> >>> The active+recovering+remapped+backfill disappeared auf a restart of a >>> cashed OSD. >>> >>> The OSD crashed after issuing the command "ceph pg repair 106.3". >>> >>> The repeating message is also there: >> Hmm. These messages indicate there are requests that came in that >> never got answered -- or else that the tracking code isn't quite right >> (it's new functionality). What version are you running? > We use: > > root@fcmsnode0:~# ceph -v > ceph version 0.42-62-gd6de0bb > (commit:d6de0bb83bcac238b3a6a376915e06fb7129b2c8) > > Kernel is 3.2.1 > > i accidently updated one of our OSDs to 0.42 -> So we updated the whole > cluster. > > The OSD repeated to crash while issuing "repair" command. The > inconsistent PGs > are all on the same (newly added) node. Oh, that's interesting. Are all the other nodes in the cluster up and in? In the next version or two we will have a lot more capability to look into what's happening with stuck PGs like this, but for the moment we need a log. If all the other nodes in the system are up, can you restart this new OSD with "debug osd = 20" and "debug ms = 1" added to its config? -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html