Re: [Ceph-users] Re: MDS failing under load with large cache sizes

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Aug 5, 2019 at 12:21 AM Janek Bevendorff
<janek.bevendorff@xxxxxxxxxxxxx> wrote:
>
> Hi,
>
> > You can also try increasing the aggressiveness of the MDS recall but
> > I'm surprised it's still a problem with the settings I gave you:
> >
> > ceph config set mds mds_recall_max_caps 15000
> > ceph config set mds mds_recall_max_decay_rate 0.75
>
> I finally had the chance to try the more aggressive recall settings, but
> they did not change anything. As soon as the client starts copying files
> again, the numbers go up an I get a health message that the client is
> failing to respond to cache pressure.
>
> After this week of idle time, the dns/inos numbers (what does dns stand
> for anyway?) settled at around 8000k. That's basically that "idle"
> number that it goes back to when the client stops copying files. Though,
> for some weird reason, this number gets (quite) a bit higher every time
> (last time it was around 960k). Of course, I wouldn't expect it to go
> back all the way to zero, because that would mean dropping the entire
> cache for no reason, but it's still quite high and the same after
> restarting the MDS and all clients, which doesn't make a lot of sense to
> me. After resuming the copy job, the number went up to 20M in just the
> time it takes to write this email. There must be a bug somewhere.
>
> > Can you share two captures of `ceph daemon mds.X perf dump` about 1
> > second apart.
>
> I attached the requested perf dumps.

Thanks that helps. Looks like the problem is that the MDS is not
automatically trimming its cache fast enough. Please try bumping
mds_cache_trim_threshold:

bin/ceph config set mds mds_cache_trim_threshold 512K

Increase it further if it's not aggressive enough. Please let us know
if that helps.

It shouldn't be necessary to do this so I'll make a tracker ticket
once we confirm that's the issue.

-- 
Patrick Donnelly, Ph.D.
He / Him / His
Senior Software Engineer
Red Hat Sunnyvale, CA
GPG: 19F28A586F808C2402351B93C3301A3E258DD79D
_______________________________________________
ceph-users mailing list
ceph-users@xxxxxxxxxxxxxx
http://lists.ceph.com/listinfo.cgi/ceph-users-ceph.com



[Index of Archives]     [Information on CEPH]     [Linux Filesystem Development]     [Ceph Development]     [Ceph Large]     [Ceph Dev]     [Linux USB Development]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]     [xfs]


  Powered by Linux