Re: [PATCH v2] zsmalloc: fix zs_can_compact() integer overflow

Minchan Kim <minchan@xxxxxxxxxx> · Tue, 10 May 2016 08:42:05 +0900

On Mon, May 09, 2016 at 11:00:52PM +0900, Sergey Senozhatsky wrote:
> zs_can_compact() has two race conditions in its core calculation:
> 
> unsigned long obj_wasted = zs_stat_get(class, OBJ_ALLOCATED) -
> 				zs_stat_get(class, OBJ_USED);
> 
> 1) classes are not locked, so the numbers of allocated and used
>    objects can change by the concurrent ops happening on other CPUs
> 2) shrinker invokes it from preemptible context
> 
> Depending on the circumstances, thus, OBJ_ALLOCATED can become
> less than OBJ_USED, which can result in either very high or
> negative `total_scan' value calculated later in do_shrink_slab().
> 
> do_shrink_slab() has some logic to prevent those cases:
> 
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-64
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
>  vmscan: shrink_slab: zs_shrinker_scan+0x0/0x28 [zsmalloc] negative objects to delete nr=-62
> 
> However, due to the way `total_scan' is calculated, not every
> shrinker->count_objects() overflow can be spotted and handled.
> To demonstrate the latter, I added some debugging code to do_shrink_slab()
> (x86_64) and the results were:
> 
>  vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
>  vmscan: but total_scan > 0: 92679974445502
>  vmscan: resulting total_scan: 92679974445502
> [..]
>  vmscan: OVERFLOW: shrinker->count_objects() == -1 [18446744073709551615]
>  vmscan: but total_scan > 0: 22634041808232578
>  vmscan: resulting total_scan: 22634041808232578
> 
> Even though shrinker->count_objects() has returned an overflowed value,
> the resulting `total_scan' is positive, and, what is more worrisome, it
> is insanely huge. This value is getting used later on in
> shrinker->scan_objects() loop:
> 
>         while (total_scan >= batch_size ||
>                total_scan >= freeable) {
>                 unsigned long ret;
>                 unsigned long nr_to_scan = min(batch_size, total_scan);
> 
>                 shrinkctl->nr_to_scan = nr_to_scan;
>                 ret = shrinker->scan_objects(shrinker, shrinkctl);
>                 if (ret == SHRINK_STOP)
>                         break;
>                 freed += ret;
> 
>                 count_vm_events(SLABS_SCANNED, nr_to_scan);
>                 total_scan -= nr_to_scan;
> 
>                 cond_resched();
>         }
> 
> `total_scan >= batch_size' is true for a very-very long time and
> 'total_scan >= freeable' is also true for quite some time, because
> `freeable < 0' and `total_scan' is large enough, for example,
> 22634041808232578. The only break condition, in the given scheme of
> things, is shrinker->scan_objects() == SHRINK_STOP test, which is a
> bit too weak to rely on, especially in heavy zsmalloc-usage scenarios.
> 
> To fix the issue, take a pool stat snapshot and use it instead of
> racy zs_stat_get() calls.
> 
> Signed-off-by: Sergey Senozhatsky <sergey.senozhatsky@xxxxxxxxx>
> Cc: Minchan Kim <minchan@xxxxxxxxxx>
> Cc: <stable@xxxxxxxxxxxxxxx>        [4.3+]
Acked-by: Minchan Kim <minchan@xxxxxxxxxx>

Thanks!

--
To unsubscribe, send a message with 'unsubscribe linux-mm' in
the body to majordomo@xxxxxxxxx.  For more info on Linux MM,
see: http://www.linux-mm.org/ .
Don't email: <a href=mailto:"dont@xxxxxxxxx";> email@xxxxxxxxx </a>