On 19.11.2015 15:00, Piotr Rybicki wrote: > > > W dniu 2015-11-19 o 14:36, Piotr Rybicki pisze: >> >> >> W dniu 2015-11-19 o 11:07, Michal Privoznik pisze: >>> On 18.11.2015 15:33, Piotr Rybicki wrote: >>>> Hi. >>>> >>>> There is a mem leak in libvirt, when doing external snapshot (for >>>> backup >>>> purposes). My KVM domain uses raw storage images via libgfapi. I'm >>>> using >>>> latest 1.2.21 libvirt (although previous versions act the same). >>>> >>>> My bash script for snapshot backup uses series of shell commands (virsh >>>> connect to a remote libvirt host): >>>> >>>> * virsh domblklist KVM >>>> * qemu-img create -f qcow2 -o backing_file=gluster(...) - precreate >>>> backing file >>>> * virsh snapshot-create KVM SNAP.xml (...) - create snapshot from >>>> precreated XML snapshot file >>>> * cp main img file >>>> * virsh blockcommit KVM disk (...) >>>> >>>> Backup script works fine, however libvirtd process gets bigger and >>>> bigger each time I run this script. >>>> >>>> Some proof of memleak: >>>> >>>> 32017 - libvirtd pid >>>> >>>> When libvirt started: >>>> # ps p 32017 o vsz,rss >>>> VSZ RSS >>>> 585736 15220 >>>> >>>> When I start KVM via virsh start KVM >>>> # ps p 32017 o vsz,rss >>>> VSZ RSS >>>> 1327968 125956 >>>> >>>> When i start backup script, after snapshot is created (lots of mem >>>> allocated) >>>> # ps p 32017 o vsz,rss >>>> VSZ RSS >>>> 3264544 537632 >>>> >>>> After backup script finished >>>> # ps p 32017 o vsz,rss >>>> VSZ RSS >>>> 3715920 644940 >>>> >>>> When i start backup script for a second time, after snapshot is created >>>> # ps p 32017 o vsz,rss >>>> VSZ RSS >>>> 5521424 1056352 >>>> >>>> And so on, until libvirt spills 'Out of memory' when connecting, ane >>>> being really huge process. >>>> >>>> Now, I would like to diagnose it further, to provide detailed >>>> information about memleak. I tried to use valgrind, but unfortunatelly >>>> I'm on Opteron 6380 platform, and valgrind doesn't support XOP quitting >>>> witch SIGILL. >>>> >>>> If someone could provide me with detailed information on how to get >>>> some >>>> usefull debug info about this memleak, i'll be more than happy to do >>>> it, >>>> and share results here. >>> >>> You can run libvirtd under valgrind (be aware that it will be slow as >>> snail), then run the reproducer and then just terminate the daemon >>> (CTRL+C). Valgrind will then report on all the leaks. When doing this I >>> usually use: >>> >>> # valgrind --leak-check=full --show-reachable=yes \ >>> --child-silent-after-fork=yes libvirtd >>> >>> Remember to terminate the system-wide daemon firstly as the one started >>> under valgrind will die early since you can only have one deamon running >>> at the time. >>> >>> If you are unfamiliar with the output, share it somewhere and I will >>> take a look. >>> >> >> Thank You Michal. >> >> I finally managed to have valgrind running on Opteron 6380. I recomplied >> with -mno-xop glibc, libvirt and other revelant libs (just for others >> looking for solution for valgrind on Opteron). >> >> Gluster is at 3.5.4 >> >> procedure is: >> start libvirtd >> start kvm >> run backup script (with external snapshot) >> stop kvm >> stop libvirtd >> >> Valgrind output: > > Sorry, better valgrind output - showing problem: > > valgrind --leak-check=full --show-reachable=yes > --child-silent-after-fork=yes /usr/sbin/libvirtd --listen 2> valgrind.log > > http://wikisend.com/download/314166/valgrind.log Interesting. I'm gonna post a couple of errors here, so that they don't get lost meanwhile: ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,444 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0x115AE41A: qemuDomainStorageFileInit (qemu_domain.c:2929) ==2650== by 0x1163DE5A: qemuDomainSnapshotCreateSingleDiskActive (qemu_driver.c:14201) ==2650== by 0x1163E604: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14371) ==2650== by 0x1163ED27: qemuDomainSnapshotCreateActiveExternal (qemu_driver.c:14559) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,445 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) ==2650== by 0x1163E843: qemuDomainSnapshotCreateDiskActive (qemu_driver.c:14421) ==2650== ==2650== 7,692,288 bytes in 2 blocks are still reachable in loss record 1,446 of 1,452 ==2650== at 0x4C2BFC8: calloc (vg_replace_malloc.c:711) ==2650== by 0x1061335C: __gf_default_calloc (mem-pool.h:75) ==2650== by 0x106137D2: __gf_calloc (mem-pool.c:104) ==2650== by 0x1061419D: mem_pool_new_fn (mem-pool.c:316) ==2650== by 0xFD69DDA: glusterfs_ctx_defaults_init (glfs.c:110) ==2650== by 0xFD6AC31: glfs_new@@GFAPI_3.4.0 (glfs.c:558) ==2650== by 0xF90321E: virStorageFileBackendGlusterInit (storage_backend_gluster.c:611) ==2650== by 0xF8F43AF: virStorageFileInitAs (storage_driver.c:2736) ==2650== by 0xF8F4B0A: virStorageFileGetMetadataRecurse (storage_driver.c:2996) ==2650== by 0xF8F4DC5: virStorageFileGetMetadataRecurse (storage_driver.c:3054) ==2650== by 0xF8F4F66: virStorageFileGetMetadata (storage_driver.c:3119) ==2650== by 0x115AE629: qemuDomainDetermineDiskChain (qemu_domain.c:2980) So, I think that we are missing few virStorageFileDeinit() calls somewhere. This is a very basic scratch: diff --git a/src/qemu/qemu_domain.c b/src/qemu/qemu_domain.c index f0ce78b..bdb511f 100644 --- a/src/qemu/qemu_domain.c +++ b/src/qemu/qemu_domain.c @@ -2970,9 +2970,10 @@ qemuDomainDetermineDiskChain(virQEMUDriverPtr driver, goto cleanup; if (disk->src->backingStore) { - if (force_probe) + if (force_probe) { + virStorageFileDeinit(disk->src); virStorageSourceBackingStoreClear(disk->src); - else + } else goto cleanup; } diff --git a/src/qemu/qemu_process.c b/src/qemu/qemu_process.c index 2192ad8..dd9a89a 100644 --- a/src/qemu/qemu_process.c +++ b/src/qemu/qemu_process.c @@ -5256,6 +5256,7 @@ void qemuProcessStop(virQEMUDriverPtr driver, dev.type = VIR_DOMAIN_DEVICE_DISK; dev.data.disk = disk; ignore_value(qemuRemoveSharedDevice(driver, &dev, vm->def->name)); + virStorageFileDeinit(disk->src); } /* Clear out dynamically assigned labels */ Can you apply it, build libvirt and give it a try? valgrind should report much fewer leaks. Michal -- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list