Re: [RFC] memcg rstat flushing optimization

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hello.

On Tue, Oct 04, 2022 at 06:17:40PM -0700, Yosry Ahmed <yosryahmed@xxxxxxxxxx> wrote:
> Sorry for the long email :)

(I'll get to other parts sometime in the future. Sorry for my latency :)

> We have recently ran into a hard lockup on a machine with hundreds of
> CPUs and thousands of memcgs during an rstat flush.
> [...]

I only respond with some remarks to this particular case.


> As you can imagine, with a sufficiently large number of
> memcgs and cpus, a call to mem_cgroup_flush_stats() might be slow, or
> in an extreme case like the one we ran into, cause a hard lockup
> (despite periodically flushing every 4 seconds).

Is this your modification from the upstream value of FLUSH_TIME (that's
every 2 s)?

In the mailthread, you also mention >10s for hard-lockups. That sounds
scary (even with the once per 4 seconds) since with large enough update
tree (and update activity) periodic flush couldn't keep up.
Also, it seems to be kind of bad feedback, the longer a (periodic) flush
takes, the lower is the frequency of them and the more updates may
accumulate. I.e. one spike in update activity can get the system into
a spiral of long flushes that won't recover once the activity doesn't
drop much more. 

(2nd point should have been about some memcg_check_events() optimization
or THRESHOLDS_EVENTS_TARGET justifying delayed flush but I've found none to be applicable.
Just noting that v2 fortunetly doesn't have the threshold
notifications.)

Regards,
Michal

Attachment: signature.asc
Description: Digital signature


[Index of Archives]     [Linux ARM Kernel]     [Linux ARM]     [Linux Omap]     [Fedora ARM]     [IETF Annouce]     [Security]     [Bugtraq]     [Linux OMAP]     [Linux MIPS]     [eCos]     [Asterisk Internet PBX]     [Linux API]     [Monitors]

  Powered by Linux