On Mon, Sep 14, 2020 at 11:17 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > On Mon 14-09-20 23:02:15, Chunxin Zang wrote: > > On Mon, Sep 14, 2020 at 9:47 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > On Mon 14-09-20 21:25:59, Chunxin Zang wrote: > > > > On Mon, Sep 14, 2020 at 5:30 PM Michal Hocko <mhocko@xxxxxxxx> wrote: > > > > > > > > > The subject is misleading because this patch doesn't fix an infinite > > > > > loop, right? It just allows the userspace to interrupt the operation. > > > > > > > > > > > > > > Yes, so we are making a separate patch follow Vlastimil's > > > recommendations. > > > > Use double of threshold to end the loop. > > > > > > That still means the changelog needs an update > > > > > > > The patch is already merged in Linux-next branch. Can I update the > > changelog now? > > Yes. Andrew will refresh it. He doesn't have a git tree which would > prevent rewriting the patch. > > > This is my first patch, please forgive me :) > > No worries. The mm patch workflow is rather different from others. > > > > > On Thu, Sep 10, 2020 at 1:59 AM Vlastimil Babka <vbabka@xxxxxxx> wrote: > > > > > > From: Chunxin Zang <zangchunxin@xxxxxxxxxxxxx> > > > > > > > > > > > ... > > > > > - IMHO it's still worth to bail out in your scenario even without a > > > > > signal, e.g. > > > > > by the doubling of threshold. But it can be a separate patch. > > > > > Thanks! > > > > > ... > > > > > > > > > > > > > > > > On Wed 09-09-20 23:20:47, zangchunxin@xxxxxxxxxxxxx wrote: > > > > > > From: Chunxin Zang <zangchunxin@xxxxxxxxxxxxx> > > > > > > > > > > > > On our server, there are about 10k memcg in one machine. They use > > > memory > > > > > > very frequently. When I tigger drop caches,the process will infinite > > > loop > > > > > > in drop_slab_node. > > > > > > > > > > Is this really an infinite loop, or it just takes a lot of time to > > > > > process all the metadata in that setup? If this is really an infinite > > > > > loop then we should look at it. My current understanding is that the > > > > > operation would finish at some time it just takes painfully long to get > > > > > there. > > > > > > > > > > > > > Yes, it's really an infinite loop. Every loop spends a lot of time. In > > > > this time, > > > > memcg will alloc/free memory, so the next loop, the total of 'freed' > > > > always bigger than 10. > > > > > > I am still not sure I follow. Do you mean that there is somebody > > > constantly generating more objects to reclaim? > > > > > > > Yes, this is my meaning. :) > > > > > > > Maybe we are just not agreeing on the definition of an infinite loop but > > > in my book that means that the final condition can never be met. While a > > > busy adding new object might indeed cause drop caches to loop for a long > > > time this is to be expected from that interface as it is supposed to > > > drop all the cache and that can grow during the operation. > > > -- > > > > > > > Because I have 10k memcg , all of them are heavy users of memory. > > During each loop, there are always more than 10 reclaimable objects > > generating, so the > > condition is never met. > > 10k or any number of memcgs shouldn't really make much of a difference. > Except for the time the scan adds. Fundamentally we are talking about > freed objects and whether they are on the global or per memcg lists > should result in a similar behavior. > > > The drop cache process has no chance to exit the > > loop. > > Although the purpose of the 'drop cache' interface is to release all > > caches, we still need a > > way to terminate it, e.g. in this case, the process took too long to run . > > Yes, this is perfectly understandable. Having a bail out on fatal signal > is a completely reasonable thing to do. I am mostly confused by your > infinite loop claims and what the relation of this patch to it. I would > propose this wording instead > > " > We have observed that drop_caches can take a considerable amount of > time (<put data here>). Especially when there are many memcgs involved > because they are adding an additional overhead. > > It is quite unfortunate that the operation cannot be interrupted by a > signal currently. Add a check for fatal signals into the main loop > so that userspace can control early bailout. > " I understand what you mean. Yes, you are right, The patch only provides a way for users to terminate the process, not to reduce time consumption, not improve the situation about loops. I will update the description of the patch in v3. Thanks so much :) > or something along those lines. > > > > > root 357956 ... R Aug25 21119854:55 echo 3 > /proc/sys/vm/drop_caches > > -- > Michal Hocko > SUSE Labs Best wishes Chunxin