Re: CephFS log jam prevention

Patrick Donnelly <pdonnell@xxxxxxxxxx> · Tue, 5 Dec 2017 14:02:41 -0800

On Tue, Dec 5, 2017 at 8:07 AM, Reed Dier <reed.dier@xxxxxxxxxxx> wrote:
> Been trying to do a fairly large rsync onto a 3x replicated, filestore HDD
> backed CephFS pool.
>
> Luminous 12.2.1 for all daemons, kernel CephFS driver, Ubuntu 16.04 running
> mix of 4.8 and 4.10 kernels, 2x10GbE networking between all daemons and
> clients.

You should try a newer kernel client if possible since the MDS is
having trouble trimming its cache.

> HEALTH_ERR 1 MDSs report oversized cache; 1 MDSs have many clients failing
> to respond to cache pressure; 1 MDSs behind on tr
> imming; noout,nodeep-scrub flag(s) set; application not enabled on 1
> pool(s); 242 slow requests are blocked > 32 sec
> ; 769378 stuck requests are blocked > 4096 sec
> MDS_CACHE_OVERSIZED 1 MDSs report oversized cache
>     mdsdb(mds.0): MDS cache is too large (23GB/8GB); 1018 inodes in use by
> clients, 1 stray files
> MDS_CLIENT_RECALL_MANY 1 MDSs have many clients failing to respond to cache
> pressure
>     mdsdb(mds.0): Many clients (37) failing to respond to cache
> pressureclient_count: 37
> MDS_TRIM 1 MDSs behind on trimming
>     mdsdb(mds.0): Behind on trimming (36252/30)max_segments: 30,
> num_segments: 36252

See also: http://tracker.ceph.com/issues/21975

You can try doubling (several times if necessary) the MDS configs
`mds_log_max_segments` and `mds_log_max_expiring` to make it more
aggressively trim its journal. (That may not help since your OSD
requests are slow.)

-- 
Patrick Donnelly
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com