On Thu, Feb 23, 2012 at 9:14 PM, Gregory Farnum <gregory.farnum@xxxxxxxxxxxxx> wrote: > On Wed, Feb 22, 2012 at 12:25 PM, Jens Rehpöhler > <jens.rehpoehler@xxxxxxxx> wrote: >> Hi Gregory, >> >> >> On 22.02.2012 18:12, Gregory Farnum wrote: >>> On Feb 22, 2012, at 1:53 AM, "Jens Rehpöhler" <jens.rehpoehler@xxxxxxxx> wrote: >>> >>>> Some Additios: meanwhile we are at the state: >>>> >>>> 2012-02-22 10:38:49.587403 pg v1044553: 2046 pgs: 2036 active+clean, >>>> 10 active+clean+inconsistent; 2110 GB data, 4061 GB used, 25732 GB / >>>> 29794 GB avail >>>> >>>> The active+recovering+remapped+backfill disappeared auf a restart of a >>>> cashed OSD. >>>> >>>> The OSD crashed after issuing the command "ceph pg repair 106.3". >>>> >>>> The repeating message is also there: >>> Hmm. These messages indicate there are requests that came in that >>> never got answered -- or else that the tracking code isn't quite right >>> (it's new functionality). What version are you running? >> We use: >> >> root@fcmsnode0:~# ceph -v >> ceph version 0.42-62-gd6de0bb >> (commit:d6de0bb83bcac238b3a6a376915e06fb7129b2c8) >> >> Kernel is 3.2.1 >> >> i accidently updated one of our OSDs to 0.42 -> So we updated the whole >> cluster. >> >> The OSD repeated to crash while issuing "repair" command. The >> inconsistent PGs >> are all on the same (newly added) node. > > Oh, that's interesting. Are all the other nodes in the cluster up and in? > > In the next version or two we will have a lot more capability to look > into what's happening with stuck PGs like this, but for the moment we > need a log. If all the other nodes in the system are up, can you > restart this new OSD with "debug osd = 20" and "debug ms = 1" added to > its config? > -Greg Actually, I suspect this might be related to that bug you reported with the messenger. If you like you can just cherry-pick 244b70296622906f01cfa3d48c931aa08e663a75 (currently HEAD on the next branch) onto your current install and see if that fixes things... -Greg -- To unsubscribe from this list: send the line "unsubscribe ceph-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html