Hi Minchan On Thu, Jan 26, 2017 at 4:57 AM, Minchan Kim <minchan@xxxxxxxxxx> wrote: > Hello Vinayak, > > On Wed, Jan 25, 2017 at 05:08:38PM +0530, Vinayak Menon wrote: >> It is noticed that during a global reclaim the memory >> reclaimed via shrinking the slabs can sometimes result >> in reclaimed pages being greater than the scanned pages >> in shrink_node. When this is passed to vmpressure, the > > I don't know you are saying zsmalloc. Anyway, it's one of those which > free larger pages than requested. I should fix that but was not sent > yet, unfortunately. As I understand, the problem is not related to a particular shrinker. In shrink_node, when subtree's reclaim efficiency is passed to vmpressure, the 4th parameter (sc->nr_scanned - nr_scanned) includes only the LRU scanned pages, but the 5th parameter (sc->nr_reclaimed - nr_reclaimed) includes the reclaimed slab pages also since in the previous step "reclaimed_slab" is added to it. i.e the slabs scanned are not included in scanned passed to vmpressure. This results in reclaimed going higher than scanned in vmpressure resulting in false events. > >> unsigned arithmetic results in the pressure value to be >> huge, thus resulting in a critical event being sent to >> root cgroup. Fix this by not passing the reclaimed slab >> count to vmpressure, with the assumption that vmpressure >> should show the actual pressure on LRU which is now >> diluted by adding reclaimed slab without a corresponding >> scanned value. > > I can't guess justfication of your assumption from the description. > Why do we consider only LRU pages for vmpressure? Could you elaborate > a bit? > When we encountered the false events from vmpressure, thought the problem could be that slab scanned is not included in sc->nr_scanned, like it is done for reclaimed. But later thought vmpressure works only on the scanned and reclaimed from LRU. I can explain what I understand, let me know if this is incorrect. vmpressure is an index which tells the pressure on LRU, and thus an indicator of thrashing. In shrink_node when we come out of the inner do-while loop after shrinking the lruvec, the scanned and reclaimed corresponds to the pressure felt on the LRUs which in turn indicates the pressure on VM. The moment we add the slab reclaimed pages to the reclaimed, we dilute the actual pressure felt on LRUs. When slab scanned/reclaimed is not included in the vmpressure, the values will indicate the actual pressure and if there were a lot of slab reclaimed pages it will result in lesser pressure on LRUs in the next run which will again be indicated by vmpressure. i.e. the pressure on LRUs indicate actual pressure on VM even if slab reclaimed is not included. Moreover, what I understand from code is, the reclaimed_slab includes only the inodesteals and the pages freed by slab allocator, and does not include the pages reclaimed by other shrinkers like lowmemorykiller, zsmalloc etc. That means even now we are including only a subset of reclaimed pages to vmpressure. Also, considering the case of a userspace lowmemorykiller which works on vmpressure on root cgroup, if the slab reclaimed in included in vmpressure, the lowmemorykiller will wait till most of the slab is shrinked before kicking in to kill a task. No ? Thanks, Vinayak -- To unsubscribe, send a message with 'unsubscribe linux-mm' in the body to majordomo@xxxxxxxxx. For more info on Linux MM, see: http://www.linux-mm.org/ . Don't email: <a href=mailto:"dont@xxxxxxxxx"> email@xxxxxxxxx </a>