Thanks Sage! What had happened prior to me upgrading was that I added an erasure coded pool, but all my OSDs began to crash. The ec profile didn't seem to cause the crash, so I left it, but once I removed the pool, the crashes stopped. Do you guys want any of the core dumps, or is anything short of a debug build with symbols and gdb on my system going to help? Joshua On Fri, Jul 11, 2014 at 7:02 AM, Sage Weil <sweil at redhat.com> wrote: > On Thu, 10 Jul 2014, Joshua McClintock wrote: > > { "rule_id": 1, > > "rule_name": "erasure-code", > > "ruleset": 1, > > "type": 3, > > The presence of the erasure code CRUSH rules it what is preventing the > kernel client from mounting. Upgrade to a newer kernel (3.14 I think?), > or remove the EC pools and these CRUSH rules to allow the old client to > mount. > > > "tunables": { "choose_local_tries": 0, > > "choose_local_fallback_tries": 0, > > "choose_total_tries": 50, > > "chooseleaf_descend_once": 1, > > "profile": "bobtail", > > "optimal_tunables": 0, > > "legacy_tunables": 0, > > "require_feature_tunables": 1, > > "require_feature_tunables2": 1}} > > Looks like fields for tunables3 and the v2 and v3 rules are missing from > the dump; I'll fix that. > > Thanks! > sage > > > > > > > > > > On Thu, Jul 10, 2014 at 8:16 PM, Sage Weil <sweil at redhat.com> wrote: > > That is CEPH_FEATURE_CRUSH_V2. Can you attach teh output of > > > > ceph osd crush dump > > > > Thanks! > > sage > > > > > > On Thu, 10 Jul 2014, Joshua McClintock wrote: > > > > > Yes, I change some of the mount options on my osds (xfs mount > > options), but > > > I think this may be the answer from dmesg, sorta looks like a > > version > > > mismatch: > > > > > > libceph: loaded (mon/osd proto 15/24) > > > > > > ceph: loaded (mds proto 32) > > > > > > libceph: mon0 192.168.0.14:6789 feature set mismatch, my > > 4a042aca < server's > > > 104a042aca, missing 1000000000 > > > > > > libceph: mon0 192.168.0.14:6789 socket error on read > > > > > > libceph: mon2 192.168.0.16:6789 feature set mismatch, my > > 4a042aca < server's > > > 104a042aca, missing 1000000000 > > > > > > libceph: mon2 192.168.0.16:6789 socket error on read > > > > > > libceph: mon1 192.168.0.15:6789 feature set mismatch, my > > 4a042aca < server's > > > 104a042aca, missing 1000000000 > > > > > > libceph: mon1 192.168.0.15:6789 socket error on read > > > > > > libceph: mon0 192.168.0.14:6789 feature set mismatch, my > > 4a042aca < server's > > > 104a042aca, missing 1000000000 > > > > > > libceph: mon0 192.168.0.14:6789 socket error on read > > > > > > libceph: mon2 192.168.0.16:6789 feature set mismatch, my > > 4a042aca < server's > > > 104a042aca, missing 1000000000 > > > > > > libceph: mon2 192.168.0.16:6789 socket error on read > > > > > > libceph: mon1 192.168.0.15:6789 feature set mismatch, my > > 4a042aca < server's > > > 104a042aca, missing 1000000000 > > > > > > libceph: mon1 192.168.0.15:6789 socket error on read > > > > > > > > > I maybe I didn't update as well as I thought it did. I did > > hit every mon, > > > but I remember I couldn't upgrade to the new 'ceph' package > > because it > > > conflicted with 'python-ceph', so I uninstalled it > > (python-ceph), and then > > > upgraded to .80.1-2. Maybe there's a subcomponent I missed? > > > > > > > > > Here's rpm -qa from the client: > > > > > > > > > [root at chefwks01 ~]# rpm -qa|grep ceph > > > > > > ceph-deploy-1.5.2-0.noarch > > > > > > ceph-release-1-0.el6.noarch > > > > > > ceph-0.80.1-2.el6.x86_64 > > > > > > libcephfs1-0.80.1-0.el6.x86_64 > > > > > > > > > Here's rpm -qa from the mons: > > > > > > > > > [root at ceph-mon01 ~]# rpm -qa|grep ceph > > > > > > ceph-0.80.1-2.el6.x86_64 > > > > > > ceph-release-1-0.el6.noarch > > > > > > libcephfs1-0.80.1-0.el6.x86_64 > > > > > > [root at ceph-mon01 ~]# > > > > > > > > > [root at ceph-mon02 ~]# rpm -qa|grep ceph > > > > > > libcephfs1-0.80.1-0.el6.x86_64 > > > > > > ceph-0.80.1-2.el6.x86_64 > > > > > > ceph-release-1-0.el6.noarch > > > > > > [root at ceph-mon02 ~]# > > > > > > > > > [root at ceph-mon03 ~]# rpm -qa|grep ceph > > > > > > libcephfs1-0.80.1-0.el6.x86_64 > > > > > > ceph-0.80.1-2.el6.x86_64 > > > > > > ceph-release-1-0.el6.noarch > > > > > > [root at ceph-mon03 ~]# > > > > > > > > > Joshua > > > > > > > > > > > > On Thu, Jul 10, 2014 at 6:04 PM, Sage Weil <sweil at redhat.com> > > wrote: > > > Have you made any other changes after the upgrade? > > (Like > > > adjusting > > > tunables, or creating EC pools?) > > > > > > See if there is anything in 'dmesg' output. > > > > > > sage > > > > > > On Thu, 10 Jul 2014, Joshua McClintock wrote: > > > > > > > I upgraded my cluster to .80.1-2 (CentOS). My mount > > command > > > just freezes > > > > and outputs an error: > > > > > > > > mount.ceph 192.168.0.14,192.168.0.15,192.168.0.16:/ > > /us-west01 > > > -o > > > > name=chefwks01,secret=`ceph-authtool -p -n > > client.admin > > > > /etc/ceph/us-west01.client.admin.keyring` > > > > > > > > mount error 5 = Input/output error > > > > > > > > > > > > Here's the output from 'ceph -s' > > > > > > > > > > > > cluster xxxxxxxxxxxxxxxxxxxxxx > > > > > > > > health HEALTH_OK > > > > > > > > monmap e1: 3monsat{ceph-mon01= > 192.168.0.14:6789/0,ceph-mon02=192.168.0.15:6789/0,ceph-m > > on03 > > > =1 > > > > 92.168.0.16:6789/0}, election epoch 88, quorum 0,1,2 > > > > ceph-mon01,ceph-mon02,ceph-mon03 > > > > > > > > mdsmap e26: 1/1/1 up {0=0=up:active} > > > > > > > > osdmap e1371: 5 osds: 5 up, 5 in > > > > > > > > pgmap v49431: 192 pgs, 3 pools, 135 GB data, 34733 > > > objects > > > > > > > > 406 GB used, 1874 GB / 2281 GB avail > > > > > > > > 192 active+clean > > > > > > > > > > > > I can see some packets being exchanged between the client > > and > > > the mon, but > > > > it's a pretty short exchange. > > > > > > > > Any ideas where to look next? > > > > > > > > Joshua > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: <http://lists.ceph.com/pipermail/ceph-users-ceph.com/attachments/20140711/5126f747/attachment.htm>