Re: Message sequence overflow

Ilya Dryomov <idryomov@xxxxxxxxx> · Wed, 1 Jun 2016 15:26:57 +0200

On Wed, Jun 1, 2016 at 2:49 PM, Sage Weil <sage@xxxxxxxxxxxx> wrote:
> On Wed, 1 Jun 2016, Yan, Zheng wrote:
>> On Wed, Jun 1, 2016 at 6:15 AM, James Webb <jamesw@xxxxxxxxxxx> wrote:
>> > Dear ceph-users...
>> >
>> > My team runs an internal buildfarm using ceph as a backend storage platform. We’ve recently upgraded to Jewel and are having reliability issues that we need some help with.
>> >
>> > Our infrastructure is the following:
>> > - We use CEPH/CEPHFS (10.2.1)
>> > - We have 3 mons and 6 storage servers with a total of 36 OSDs (~4160 PGs).
>> > - We use enterprise SSDs for everything including journals
>> > - We have one main mds and one standby mds.
>> > - We are using ceph kernel client to mount cephfs.
>> > - We have upgrade to Ubuntu 16.04 (4.4.0-22-generic kernel)
>> > - We are using a kernel NFS to serve NFS clients from a ceph mount (~ 32 nfs threads. 0 swappiness)
>> > - These are physical machines with 8 cores & 32GB memory
>> >
>> > On a regular basis, we lose all IO via ceph FS. We’re still trying to isolate the issue but it surfaces as an issue between MDS and ceph client.
>> > We can’t tell if our our NFS server is overwhelming the MDS or if this is some unrelated issue. Tuning NFS server has not solved our issues.
>> > So far our only recovery has been to fail the MDS and then restart our NFS. Any help or advice will be appreciated on the CEPH side of things.
>> > I’m pretty sure we’re running with default tuning of CEPH MDS configuration parameters.
>> >
>> >
>> > Here are the relevant log entries.
>> >
>> > From my primary MDS server, I start seeing these entries start to pile up:
>> >
>> > 2016-05-31 14:34:07.091117 7f9f2eb87700  0 log_channel(cluster) log [WRN] : client.4283066 isn't responding to mclientcaps(revoke), ino 10000004491 pending pAsLsXsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.877480 seconds ago\
>> > 2016-05-31 14:34:07.091129 7f9f2eb87700  0 log_channel(cluster) log [WRN] : client.4283066 isn't responding to mclientcaps(revoke), ino 10000005ddf pending pAsLsXsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.877382 seconds ago\
>> > 2016-05-31 14:34:07.091133 7f9f2eb87700  0 log_channel(cluster) log [WRN] : client.4283066 isn't responding to mclientcaps(revoke), ino 10000000a2a pending pAsLsXsFsxcrwb issued pAsxLsXsxFsxcrwb, sent 63.877356 seconds ago
>> >
>> > From my NFS server, I see these entries from dmesg also start piling up:
>> > [Tue May 31 14:33:09 2016] libceph: skipping mds0 X.X.X.195:6800 seq 0 expected 4294967296
>> > [Tue May 31 14:33:09 2016] libceph: skipping mds0 X.X.X.195:6800 seq 1 expected 4294967296
>> > [Tue May 31 14:33:09 2016] libceph: skipping mds0 X.X.X.195:6800 seq 2 expected 4294967296
>> >
>>
>> 4294967296 is 0x100000000, this looks like sequence  overflow.
>>
>> In src/msg/Message.h:
>>
>> class Message {
>> ...
>>   unsigned get_seq() const { return header.seq; }
>>   void set_seq(unsigned s) { header.seq = s; }
>> ...
>> }
>>
>> in src/msg/simple/Pipe.cc
>>
>> class Pipe {
>> ...
>>   __u32 get_out_seq() { return out_seq; }
>> ...
>> }
>>
>> Is this bug or intentional ?
>
> That's a bug.  The seq values are intended to be 32 bits.
>
> (We should also be using the ceph_cmp_seq (IIRC) helper for any inequality
> checks, which does a sloppy comparison so that a 31-bit signed difference
> is used to determine > or <.  It sounds like in this case we're just
> failing an equality check, though.)

ceph_seq_cmp().  I'll patch the kernel client then (if Zheng doesn't
get to it first).

Thanks,

                Ilya
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com