Re: After Luminous upgrade: ceph-fuse clients failing to respond to cache pressure

John Spray <jspray@xxxxxxxxxx> · Wed, 17 Jan 2018 18:11:46 +0000

On Wed, Jan 17, 2018 at 3:36 PM, Andras Pataki
<apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
> Hi John,
>
> All our hosts are CentOS 7 hosts, the majority are 7.4 with kernel
> 3.10.0-693.5.2.el7.x86_64, with fuse 2.9.2-8.el7.  We have some hosts that
> have slight variations in kernel versions, the oldest one are a handful of
> CentOS 7.3 hosts with kernel 3.10.0-514.21.1.el7.x86_64 and fuse
> 2.9.2-7.el7.  I know Redhat has been backporting lots of stuff so perhaps
> these kernels fall into the category you are describing?

Quite possibly -- this issue was originally noticed on RHEL, so maybe
the relevant bits made it back to CentOS recently.

However, it looks like the fixes for that issue[1,2] are already in
12.2.2, so maybe this is something completely unrelated :-/

The ceph-fuse executable does create an admin command socket in
/var/run/ceph (named something ceph-client...) that you can drive with
"ceph daemon <socket> dump_cache", but the output is extremely verbose
and low level and may not be informative.

John

1. http://tracker.ceph.com/issues/21423
2. http://tracker.ceph.com/issues/22269

>
> When the cache pressure problem happens, is there a way to know exactly
> which hosts are involved, and what items are in their caches easily?
>
> Andras
>
>
>
> On 01/17/2018 06:09 AM, John Spray wrote:
>>
>> On Tue, Jan 16, 2018 at 8:50 PM, Andras Pataki
>> <apataki@xxxxxxxxxxxxxxxxxxxxx> wrote:
>>>
>>> Dear Cephers,
>>>
>>> We've upgraded the back end of our cluster from Jewel (10.2.10) to
>>> Luminous
>>> (12.2.2).  The upgrade went smoothly for the most part, except we seem to
>>> be
>>> hitting an issue with cephfs.  After about a day or two of use, the MDS
>>> start complaining about clients failing to respond to cache pressure:
>>
>> What's the OS, kernel version and fuse version on the hosts where the
>> clients are running?
>>
>> There have been some issues with ceph-fuse losing the ability to
>> properly invalidate cached items when certain updated OS packages were
>> installed.
>>
>> Specifically, ceph-fuse checks the kernel version against 3.18.0 to
>> decide which invalidation method to use, and if your OS has backported
>> new behaviour to a low-version-numbered kernel, that can confuse it.
>>
>> John
>>
>>> [root@cephmon00 ~]# ceph -s
>>>    cluster:
>>>      id:     d7b33135-0940-4e48-8aa6-1d2026597c2f
>>>      health: HEALTH_WARN
>>>              1 MDSs have many clients failing to respond to cache
>>> pressure
>>>              noout flag(s) set
>>>              1 osds down
>>>
>>>    services:
>>>      mon: 3 daemons, quorum cephmon00,cephmon01,cephmon02
>>>      mgr: cephmon00(active), standbys: cephmon01, cephmon02
>>>      mds: cephfs-1/1/1 up  {0=cephmon00=up:active}, 2 up:standby
>>>      osd: 2208 osds: 2207 up, 2208 in
>>>           flags noout
>>>
>>>    data:
>>>      pools:   6 pools, 42496 pgs
>>>      objects: 919M objects, 3062 TB
>>>      usage:   9203 TB used, 4618 TB / 13822 TB avail
>>>      pgs:     42470 active+clean
>>>               22    active+clean+scrubbing+deep
>>>               4     active+clean+scrubbing
>>>
>>>    io:
>>>      client:   56122 kB/s rd, 18397 kB/s wr, 84 op/s rd, 101 op/s wr
>>>
>>> [root@cephmon00 ~]# ceph health detail
>>> HEALTH_WARN 1 MDSs have many clients failing to respond to cache
>>> pressure;
>>> noout flag(s) set; 1 osds down
>>> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to
>>> cache
>>> pressure
>>>      mdscephmon00(mds.0): Many clients (103) failing to respond to cache
>>> pressureclient_count: 103
>>> OSDMAP_FLAGS noout flag(s) set
>>> OSD_DOWN 1 osds down
>>>      osd.1296 (root=root-disk,pod=pod0-disk,host=cephosd008-disk) is down
>>>
>>>
>>> We are using exclusively the 12.2.2 fuse client on about 350 nodes or so
>>> (out of which it seems 100 are not responding to cache pressure in this
>>> log).  When this happens, clients appear pretty sluggish also (listing
>>> directories, etc.).  After bouncing the MDS, everything returns on normal
>>> after the failover for a while.  Ignore the message about 1 OSD down,
>>> that
>>> corresponds to a failed drive and all data has been re-replicated since.
>>>
>>> We were also using the 12.2.2 fuse client with the Jewel back end before
>>> the
>>> upgrade, and have not seen this issue.
>>>
>>> We are running with a larger MDS cache than usual, we have mds_cache_size
>>> set to 4 million.  All other MDS configs are the defaults.
>>>
>>> Is this a known issue?  If not, any hints on how to further diagnose the
>>> problem?
>>>
>>> Andras
>>>
>>>
>>> _______________________________________________
>>> ceph-users mailing list
>>> ceph-users@xxxxxxxxxxxxxx
>>> http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com
>>>
>
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com