Re: Repeated lvcreate&lvremove quickly increases page cache size

Zdenek Kabelac <zdenek.kabelac@xxxxxxxxx> · Thu, 2 May 2024 14:59:14 +0200

Dne 02. 05. 24 v 7:47 Ryotaro Banno (伴野 良太郎) napsal(a):
Hi all,

When I run repeatedly `lvm lvcreate` and `lvm lvremove`, I have noticed that
the page cache size increases gradually. For example, when I used the following
commands:

```
while :; do
   sudo lvm lvcreate -n 7ee0d142-bffe-459f-87cb-56a355d24d2c -L 104857600b -W y -y ubuntu-vg;
   sudo lvm lvremove -f ubuntu-vg/7ee0d142-bffe-459f-87cb-56a355d24d2c;
   sleep 0.1;
done
```

free(1)'s buff/cache increased by about 50KiB/s on average.

This phenomenon would not be a problem if I ran LVM on the host.  In fact, I
run the same commands in a Docker container with relatively small RAM limited
by cgroup.  Then, the container is killed by OOM some time after startup,
presumably due to the increased page cache. Does anyone know how to work around
this problem?  This behavior can be reproduced as follows:

```
# Assume a VG named "ubuntu-vg" is already set up on the host.
docker run --privileged --pid=host --memory=15m -it ubuntu:22.04 bash -c '
while :; do
   /usr/bin/nsenter -m -u -i -n -p -t 1 /sbin/lvm lvcreate -n 7ee0d142-bffe-459f-87cb-56a355d24d2c -L 104857600b -W y -y ubuntu-vg
   /usr/bin/nsenter -m -u -i -n -p -t 1 /sbin/lvm lvremove -f ubuntu-vg/7ee0d142-bffe-459f-87cb-56a355d24d2c
   sleep 0.1
done'

# After a while, lvcreate gets OOM killed like:
#   bash: line 6: 4099016 Killed                  /usr/bin/nsenter -m -u -i -n -p -t 1 /sbin/lvm lvremove -f ubuntu-vg/7ee0d142-bffe-459f-87cb-56a355d24d2c
```

The same thing happens on a Kubernetes container.  For some unknown reason, the
increase in the page cache size is faster on Kubernetes than on others (host
and docker); it only increases, never decreases.  This can be reproduced using
Minikube with KVM2 as follows:

```
# Set up Minikube and LVM
minikube start --driver=kvm2
minikube ssh -- sudo truncate --size=10G backing_store
minikube ssh -- sudo losetup -f backing_store
minikube ssh -- sudo vgcreate vg1 $(minikube ssh -n minikube -- sudo losetup -j backing_store | cut -d':' -f1)

# Create a pod that repeatedly run lvcreate&lvremove
cat <<EOS | minikube kubectl -- apply -f -
apiVersion: v1
kind: Pod
metadata:
   name: test-pod
spec:
   hostPID: true
   containers:
   - command:
     - bash
     - -cex
     - |
       # Run lvcreate&lvremove repeatedly
       while :; do
           /usr/bin/nsenter -m -u -i -n -p -t 1 /sbin/lvm lvcreate -n 7ee0d142-bffe-459f-87cb-56a355d24d2c -L 104857600b -W y -y vg1
           /usr/bin/nsenter -m -u -i -n -p -t 1 /sbin/lvm lvremove -f vg1/7ee0d142-bffe-459f-87cb-56a355d24d2c
           sleep 0.1
       done
     image: ubuntu:22.04
     name: lvm
     securityContext:
       privileged: true

     # Limit memory usage upto 15Mi
     resources:
       limits:
         memory: 15Mi
EOS

# Watch the pod status. It will get OOM killed continuously.
minikube kubectl -- get pod test-pod -w

# Check the memory usage statistics exposed by cgroup. The field `total_cache` should be huge.
minikube ssh -- cat /sys/fs/cgroup/memory/kubepods/burstable/pod$(minikube kubectl -- get pod test-pod -o jsonpath='{.metadata.uid}')/memory.stat

# Clean up
minikube delete --all --purge
```

Does anyone know how to work around this problem?

My environment is as follows:

- OS: Ubuntu 22.04.4 LTS
   - `uname -a`: Linux _ 5.15.0-78-generic #85-Ubuntu SMP Fri Jul 7 15:25:09 UTC 2023 x86_64 x86_64 x86_64 GNU/Linux
- Docker version: 26.0.1
- minikube version: v1.32.0 (commit: 8220a6eb95f0a4d75f7f2d7b14cef975f050512d)

Hello

Well you could eventually try to use kernel with 'memory leak' enabled 
detector to see whether there is some kernel leak.

There can't be leak in lvm2 code - since the binary 'exits' - and kernel is 
responsible for any leaks here.

However I'd first start reproducing your issue with  newer kernel 6.8, 6.9-rcX
As there is nothing more boring than chasing some old possibly already fixed 
kernel bugs.

You should also probably flush all your page cache between your memory size 
sampling with:   'echo  3 >/proc/sys/vm/drop_caches'

And final note:  lvm2  is NOT designed and should NOT be used within Docker 
(or any other container) software - it's tightly integrated with your kernel 
API - while containers are more or less 'virtual machines' for user-space 
apps.   So problems caused by misbehaving udev (which doesn't exists/runs) in 
your Docker are quite normal and it requires some level of expertise to know 
how to deal with those issues...

Regards

Zdenek