On Wed, Apr 02, 2014 at 02:18:40PM -0400, Jeff Moyer wrote: > "Richard W.M. Jones" <rjones@xxxxxxxxxx> writes: > > Hi, Richard, > > > virt-sparsify is a tool for trimming free space in virtual disk > > images. The new implementation uses vfs/kernel/qemu discard support. > > Essentially it does: > > > > Presumably there's a "start guest" step here that's missing? Yup, it starts up a small appliance to do these operations. > > for each filesystem: > > mount -o discard $fs /mnt > > What is $fs? Do you pass in a list of devices? Yes and no. We examine the partitions, logical volumes and so on in order to get a list of mountable filesystems, and then the list is iterated over in this loop. The precise code for finding the filesystems is here: https://github.com/libguestfs/libguestfs/blob/master/src/listfs.c#L45 ^ That code is running on the host side. It issues various calls to the appliance side which are executed by code in multiple files here: https://github.com/libguestfs/libguestfs/tree/master/daemon > Also, you don't need to mount with -o discard in order to use fstrim. > In fact, I'd recommend against doing that. > > > sync > > Interesting. Have you seen mount dirty inodes or something? The sync is actually not material here. However I included it for completeness because it is an effective workaround for another unreliability case where you delete some files before doing the fstrim, and ext4 is slow enough that the files you remove don't return space to the host. The relevant code is: https://github.com/libguestfs/libguestfs/blob/master/daemon/fstrim.c#L53 > > fstrim /mnt > > umount /mnt > > sync > > # qemu is killed after sync returns > > > > Although typing these commands by hand works fine, when you run them > > from a program the fstrim doesn't happen all the way down the stack > > reliably. Mostly it works, but sometimes it only trims some space > > from the host file. > > What is in the stack? Are you using qcow2 images, plain files, device > mapper, anything else? In the test case it is recent kernel -> virtio-scsi -> qemu -> raw format local file stored on host filesystem (ext4 on the test machine). > Which file systems are you testing, and are they > used in the host, the guest or both? ext4 guest and host in this case. > How are you checking for success? We measure the file size (stat.st_blocks) on the host during the test. There are various thresholds which count as success (see test script linked below). In the case where it is failing it's hardly discarding any blocks, although it does discard some. > Do you have a golden image you start with so that your test case is > repeatable? We create images on the fly, but yes I'm confident that the test is repeatable (although that doesn't mean it is failing on every run -- it's a race condition of some sort). The test code is here: https://github.com/libguestfs/libguestfs/blob/master/tests/discard/test-fstrim.pl > > It appears that when the host is slow / under load, the problem > > happens more frequently. Also it may happen more frequently on i686 > > than on x86-64 (possibly also due to speed of host). > > I don't know of any reason that any of the variables you listed would > affect the reliability at all. As far as I can tell, fstrim is a > synchronous ioctl. I believe the only reason space wouldn't be freed is > if the fs is fragmented in such a way as to not meet the minimum trim > granularity of the underlying device. It's a freshly created filesystem so I guess it's not likely to be fragmented. I suspect it's something to do with how we kill qemu. Requests are in flight somewhere. Just not sure how we sync "enough" to make sure everything is on the host. FWIW here is the elaborate sync dance we currently do to work around bugs present and past: https://github.com/libguestfs/libguestfs/blob/master/daemon/sync.c#L54 Thanks, Rich. -- Richard Jones, Virtualization Group, Red Hat http://people.redhat.com/~rjones Read my programming blog: http://rwmj.wordpress.com Fedora now supports 80 OCaml packages (the OPEN alternative to F#) -- To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html