Re: ceph slow at 80% full, mds nodes lots of unused memory

Simon Oosthoek <s.oosthoek@xxxxxxxxxxxxx> · Thu, 25 Feb 2021 11:47:41 +0100

On 25/02/2021 11:19, Dylan McCulloch wrote:
> Simon Oosthoek wrote:
>> On 24/02/2021 22:28, Patrick Donnelly wrote:
>> >   Hello Simon,
>> >  
>> >  On Wed, Feb 24, 2021 at 7:43 AM Simon Oosthoek
> &lt;s.oosthoek(a)science.ru.nl&gt; wrote:
>> >  
>> >  On 24/02/2021 12:40, Simon Oosthoek wrote:
>> >   Hi
>> >
>> >  we've been running our Ceph cluster for nearly 2 years now (Nautilus)
>> >  and recently, due to a temporary situation the cluster is at 80% full.
>> >
>> >  We are only using CephFS on the cluster.
>> >
>> >  Normally, I realize we should be adding OSD nodes, but this is a
>> >  temporary situation, and I expect the cluster to go to <60% full
> quite soon.
>> >
>> >  Anyway, we are noticing some really problematic slowdowns. There are
>> >  some things that could be related but we are unsure...
>> >
>> >  - Our 2 MDS nodes (1 active, 1 standby) are configured with 128GB RAM,
>> >  but are not using more than 2GB, this looks either very inefficient, or
>> >  wrong ;-)
>> >  After looking at our monitoring history, it seems the mds cache is
>> >  actually used more fully, but most of our servers are getting a weekly
>> >  reboot by default. This clears the mds cache obviously. I wonder if
>> >  that's a smart idea for an MDS node...? ;-)  
>> >  No, it's not. Can you also check that you do not have mds_cache_size
>> >  configured, perhaps on the MDS local ceph.conf?
>> >  
>> Hi Patrick,
>>
>> I've already changed the reboot period to 1 month.
>>
>> The mds_cache_size is not configured locally in the /etc/ceph/ceph.conf
>> file, so I guess it's just the weekly reboot that cleared the memory of
>> cache data...
>>
>> I'm starting to think that a full ceph cluster could probably be the
>> only cause of performance problems. Though I don't know why that would be.
> 
> Did the performance issue only arise when OSDs in the cluster reached
> 80% usage? What is your osd nearfull_ratio?
> $ ceph osd dump | grep ratio
full_ratio 0.95
backfillfull_ratio 0.9
nearfull_ratio 0.85

> Is the cluster in HEALTH_WARN with nearfull OSDs?

]# ceph -s
  cluster:
    id:     b489547c-ba50-4745-a914-23eb78e0e5dc
    health: HEALTH_WARN
            2 pgs not deep-scrubbed in time
            957 pgs not scrubbed in time

  services:
    mon: 3 daemons, quorum cephmon3,cephmon1,cephmon2 (age 7d)
    mgr: cephmon3(active, since 2M), standbys: cephmon1, cephmon2
    mds: cephfs:1 {0=cephmds2=up:active} 1 up:standby
    osd: 168 osds: 168 up (since 11w), 168 in (since 9M); 43 remapped pgs

  task status:
    scrub status:
        mds.cephmds2: idle

  data:
    pools:   10 pools, 5280 pgs
    objects: 587.71M objects, 804 TiB
    usage:   1.4 PiB used, 396 TiB / 1.8 PiB avail
    pgs:     9634168/5101965463 objects misplaced (0.189%)
             5232 active+clean
             29   active+remapped+backfill_wait
             14   active+remapped+backfilling
             5    active+clean+scrubbing+deep+repair

  io:
    client:   136 MiB/s rd, 600 MiB/s wr, 386 op/s rd, 359 op/s wr
    recovery: 328 MiB/s, 169 objects/s

> We noticed recently when one of our clusters had nearfull OSDs that
> cephfs client performance was heavily impacted.
> Our cluster is nautilus 14.2.15. Clients are kernel 4.19.154.
> We determined that it was most likely due to the ceph client forcing
> sync file writes when nearfull flag is present.
> https://github.com/ceph/ceph-client/commit/7614209736fbc4927584d4387faade4f31444fce
> Increasing and decreasing the nearfull ratio confirmed that performance
> was only impacted while the nearfull flag was present.
> Not sure if that's relevant for your case.

I think this could be very similar in our cluster, thanks for sharing
your insights!

Cheers

/Simon
_______________________________________________
ceph-users mailing list -- ceph-users@xxxxxxx
To unsubscribe send an email to ceph-users-leave@xxxxxxx