Re: cephfs kernel client instability

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



We are experiencing the same issues on clients with CephFS mounted
using the kernel client and 4.x kernels.

The problem  shows up when we add new OSDs, on reboots after
installing patches and when changing the weight.

Here the logs of a misbehaving client;

[6242967.890611] libceph: mon4 10.8.55.203:6789 session established
[6242968.010242] libceph: osd534 10.7.55.23:6814 io error
[6242968.259616] libceph: mon1 10.7.55.202:6789 io error
[6242968.259658] libceph: mon1 10.7.55.202:6789 session lost, hunting
for new mon
[6242968.359031] libceph: mon4 10.8.55.203:6789 session established
[6242968.622692] libceph: osd534 10.7.55.23:6814 io error
[6242968.692274] libceph: mon4 10.8.55.203:6789 io error
[6242968.692337] libceph: mon4 10.8.55.203:6789 session lost, hunting
for new mon
[6242968.694216] libceph: mon0 10.7.55.201:6789 session established
[6242969.099862] libceph: mon0 10.7.55.201:6789 io error
[6242969.099888] libceph: mon0 10.7.55.201:6789 session lost, hunting
for new mon
[6242969.224565] libceph: osd534 10.7.55.23:6814 io error

Additional to the MON io error we also got some OSD io errors.

Moreover when the error occurs several clients causes a
"MDS_CLIENT_LATE_RELEASE" error on the MDS server.

We are currently running on Luminous 12.2.10 and have around 580 OSDs
and 5 monitor nodes. The cluster is running on CentOS 7.6.

The ‘osd_map_message_max’ setting is set to the default value of 40.
But we are still getting these errors.

Best,
Martin


On Wed, Jan 16, 2019 at 7:46 PM Ilya Dryomov <idryomov@xxxxxxxxx> wrote:
>
> On Wed, Jan 16, 2019 at 7:12 PM Andras Pataki
> <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> >
> > Hi Ilya/Kjetil,
> >
> > I've done some debugging and tcpdump-ing to see what the interaction
> > between the kernel client and the mon looks like.  Indeed -
> > CEPH_MSG_MAX_FRONT defined as 16Mb seems low for the default mon
> > messages for our cluster (with osd_mon_messages_max at 100).  We have
> > about 3500 osd's, and the kernel advertises itself as older than
>
> This is too big, especially for a fairly large cluster such as yours.
> The default was reduced to 40 in luminous.  Given about 3500 OSDs, you
> might want to set it to 20 or even 10.
>
> > Luminous, so it gets full map updates.  The FRONT message size on the
> > wire I saw was over 24Mb.  I'll try setting osd_mon_messages_max to 30
> > and do some more testing, but from the debugging it definitely seems
> > like the issue.
> >
> > Is the kernel driver really not up to date to be considered at least a
> > Luminous client by the mon (i.e. it has some feature really missing)?  I
> > looked at the bits, and the MON seems to want is bit 59 in ceph features
> > shared by FS_BTIME, FS_CHANGE_ATTR, MSG_ADDR2.  Can the kernel client be
> > used when setting require-min-compat to luminous (either with the 4.19.x
> > kernel or the Redhat/Centos 7.6 kernel)?  Some background here would be
> > helpful.
>
> Yes, the kernel client is missing support for that feature bit, however
> 4.13+ and RHEL 7.5+ _can_ be used with require-min-compat-client set to
> luminous.  See
>
> http://lists.ceph.com/pipermail/ceph-users-ceph.com/2018-May/027002.html
>
> Thanks,
>
>                 Ilya
> _______________________________________________
> ceph-users mailing list
> ceph-users@xxxxxxxxxxxxxx
> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com




[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux