ceph mount not working anymore

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Thanks Sage!  What had happened prior to me upgrading was that I added an
erasure coded pool, but all my OSDs began to crash.  The ec profile didn't
seem to cause the crash, so I left it, but once I removed the pool, the
crashes stopped.

Do you guys want any of the core dumps, or is anything short of a debug
build with symbols and gdb on my system going to help?

Joshua


On Fri, Jul 11, 2014 at 7:02 AM, Sage Weil <sweil at redhat.com> wrote:

> On Thu, 10 Jul 2014, Joshua McClintock wrote:
> >         { "rule_id": 1,
> >           "rule_name": "erasure-code",
> >           "ruleset": 1,
> >           "type": 3,
>
> The presence of the erasure code CRUSH rules it what is preventing the
> kernel client from mounting.  Upgrade to a newer kernel (3.14 I think?),
> or remove the EC pools and these CRUSH rules to allow the old client to
> mount.
>
> >   "tunables": { "choose_local_tries": 0,
> >       "choose_local_fallback_tries": 0,
> >       "choose_total_tries": 50,
> >       "chooseleaf_descend_once": 1,
> >       "profile": "bobtail",
> >       "optimal_tunables": 0,
> >       "legacy_tunables": 0,
> >       "require_feature_tunables": 1,
> >       "require_feature_tunables2": 1}}
>
> Looks like fields for tunables3 and the v2 and v3 rules are missing from
> the dump; I'll fix that.
>
> Thanks!
> sage
>
>
> >
> >
> >
> > On Thu, Jul 10, 2014 at 8:16 PM, Sage Weil <sweil at redhat.com> wrote:
> >       That is CEPH_FEATURE_CRUSH_V2.  Can you attach teh output of
> >
> >        ceph osd crush dump
> >
> >       Thanks!
> >       sage
> >
> >
> >       On Thu, 10 Jul 2014, Joshua McClintock wrote:
> >
> >       > Yes, I change some of the mount options on my osds (xfs mount
> >       options), but
> >       > I think this may be the answer from dmesg, sorta looks like a
> >       version
> >       > mismatch:
> >       >
> >       > libceph: loaded (mon/osd proto 15/24)
> >       >
> >       > ceph: loaded (mds proto 32)
> >       >
> >       > libceph: mon0 192.168.0.14:6789 feature set mismatch, my
> >       4a042aca < server's
> >       > 104a042aca, missing 1000000000
> >       >
> >       > libceph: mon0 192.168.0.14:6789 socket error on read
> >       >
> >       > libceph: mon2 192.168.0.16:6789 feature set mismatch, my
> >       4a042aca < server's
> >       > 104a042aca, missing 1000000000
> >       >
> >       > libceph: mon2 192.168.0.16:6789 socket error on read
> >       >
> >       > libceph: mon1 192.168.0.15:6789 feature set mismatch, my
> >       4a042aca < server's
> >       > 104a042aca, missing 1000000000
> >       >
> >       > libceph: mon1 192.168.0.15:6789 socket error on read
> >       >
> >       > libceph: mon0 192.168.0.14:6789 feature set mismatch, my
> >       4a042aca < server's
> >       > 104a042aca, missing 1000000000
> >       >
> >       > libceph: mon0 192.168.0.14:6789 socket error on read
> >       >
> >       > libceph: mon2 192.168.0.16:6789 feature set mismatch, my
> >       4a042aca < server's
> >       > 104a042aca, missing 1000000000
> >       >
> >       > libceph: mon2 192.168.0.16:6789 socket error on read
> >       >
> >       > libceph: mon1 192.168.0.15:6789 feature set mismatch, my
> >       4a042aca < server's
> >       > 104a042aca, missing 1000000000
> >       >
> >       > libceph: mon1 192.168.0.15:6789 socket error on read
> >       >
> >       >
> >       > I maybe I didn't update as well as I thought it did.  I did
> >       hit every mon,
> >       > but I remember I couldn't upgrade to the new 'ceph' package
> >       because it
> >       > conflicted with 'python-ceph', so I uninstalled it
> >       (python-ceph), and then
> >       > upgraded to .80.1-2.   Maybe there's a subcomponent I missed?
> >       >
> >       >
> >       > Here's rpm -qa from the client:
> >       >
> >       >
> >       > [root at chefwks01 ~]# rpm -qa|grep ceph
> >       >
> >       > ceph-deploy-1.5.2-0.noarch
> >       >
> >       > ceph-release-1-0.el6.noarch
> >       >
> >       > ceph-0.80.1-2.el6.x86_64
> >       >
> >       > libcephfs1-0.80.1-0.el6.x86_64
> >       >
> >       >
> >       > Here's rpm -qa from the mons:
> >       >
> >       >
> >       > [root at ceph-mon01 ~]# rpm -qa|grep ceph
> >       >
> >       > ceph-0.80.1-2.el6.x86_64
> >       >
> >       > ceph-release-1-0.el6.noarch
> >       >
> >       > libcephfs1-0.80.1-0.el6.x86_64
> >       >
> >       > [root at ceph-mon01 ~]#
> >       >
> >       >
> >       > [root at ceph-mon02 ~]# rpm -qa|grep ceph
> >       >
> >       > libcephfs1-0.80.1-0.el6.x86_64
> >       >
> >       > ceph-0.80.1-2.el6.x86_64
> >       >
> >       > ceph-release-1-0.el6.noarch
> >       >
> >       > [root at ceph-mon02 ~]#
> >       >
> >       >
> >       > [root at ceph-mon03 ~]# rpm -qa|grep ceph
> >       >
> >       > libcephfs1-0.80.1-0.el6.x86_64
> >       >
> >       > ceph-0.80.1-2.el6.x86_64
> >       >
> >       > ceph-release-1-0.el6.noarch
> >       >
> >       > [root at ceph-mon03 ~]#
> >       >
> >       >
> >       > Joshua
> >       >
> >       >
> >       >
> >       > On Thu, Jul 10, 2014 at 6:04 PM, Sage Weil <sweil at redhat.com>
> >       wrote:
> >       >       Have you made any other changes after the upgrade?
> >        (Like
> >       >       adjusting
> >       >       tunables, or creating EC pools?)
> >       >
> >       >       See if there is anything in 'dmesg' output.
> >       >
> >       >       sage
> >       >
> >       >       On Thu, 10 Jul 2014, Joshua McClintock wrote:
> >       >
> >       >       > I upgraded my cluster to .80.1-2 (CentOS).  My mount
> >       command
> >       >       just freezes
> >       >       > and outputs an error:
> >       >       >
> >       >       > mount.ceph 192.168.0.14,192.168.0.15,192.168.0.16:/
> >       /us-west01
> >       >       -o
> >       >       > name=chefwks01,secret=`ceph-authtool -p -n
> >       client.admin
> >       >       > /etc/ceph/us-west01.client.admin.keyring`
> >       >       >
> >       >       > mount error 5 = Input/output error
> >       >       >
> >       >       >
> >       >       > Here's the output from 'ceph -s'
> >       >       >
> >       >       >
> >       >       >     cluster xxxxxxxxxxxxxxxxxxxxxx
> >       >       >
> >       >       >      health HEALTH_OK
> >       >       >
> >       >       >      monmap e1: 3monsat{ceph-mon01=
> 192.168.0.14:6789/0,ceph-mon02=192.168.0.15:6789/0,ceph-m
> >       on03
> > >       =1
> > >       > 92.168.0.16:6789/0}, election epoch 88, quorum 0,1,2
> > >       > ceph-mon01,ceph-mon02,ceph-mon03
> > >       >
> > >       >      mdsmap e26: 1/1/1 up {0=0=up:active}
> > >       >
> > >       >      osdmap e1371: 5 osds: 5 up, 5 in
> > >       >
> > >       >       pgmap v49431: 192 pgs, 3 pools, 135 GB data, 34733
> > >       objects
> > >       >
> > >       >             406 GB used, 1874 GB / 2281 GB avail
> > >       >
> > >       >                  192 active+clean
> > >       >
> > >       >
> > >       > I can see some packets being exchanged between the client
> > and
> > >       the mon, but
> > >       > it's a pretty short exchange.
> > >       >
> > >       > Any ideas where to look next?
> > >       >
> > >       > Joshua
> > >       >
> > >       >
> > >       >
> > >
> > >
> > >
> > >
> >
> >
> >
> >
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140711/5126f747/attachment.htm>


[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux