On Jul 27, 2013, at 10:56 AM, Lennart Poettering <mzerqung@xxxxxxxxxxx> wrote: > > Well, I am pretty sure the burden must be on the file systems to report > a useful estimate free blocks value in statfs()/statvfs(). tl;dr 4 VMs, each using one thinp LV. Each LV has a virtualsize of 1TB. The VG backing those LVs is 1TB. If each LV actually is using only 150GB, the real free space in the VG is 400GB. But how to you propose informing each VMs of the real free space? Are they all informed there's 400GB of free space? Or do you just do a simple scaling and tell them 400GB/4 is free? OK well what if 2 of those VMs actively make use of snapshotting? The scaling approach quickly isn't going to work out for any of the VMs. I think the burden is on the virtual storage layer designer/implementer. He shouldn't make 1TB virtualsize LVs, when only 150GB is needed. The idea isn't to use thinp to totally eliminate the need to ever grow an LV and the underlying fs, but to reduce the need (perhaps significantly). > Note that btrfs RAID is broken in a similar way: it will return the > amount of actual free blocks to the user. Since if RAID is enabled each > file however requires twice (or some other factor) the number of blocks > the value is completely bogus. The btrfs RAID userspace API is simply > broken. It's a problem. I'm unconvinced it's broken. As I mentioned earlier, a btrfs volume as a whole doesn't have a raid profile set. It's the subvolume (or possibly a file). Because the work isn't done to enable per subvolume or per file raid profiles, this is done at mkfs time. But this actually only sets the profile for the default subvolume, not the whole file system. It just seems it is that way now. So you could argue that in the meantime, btrfs devs should punt, and report free space similar to md and lvm raid. Long term fix seems to require the application making a more qualified inquiry. Asking free space for a whole volume that it may not even have write permission for seems unreasonable. It should instead ask for free space for a particular path. The actual write location might be a directory with a quota that must be honored; or a subvolume with a raid1 data profile set. The program asking for volume free space is a totally ambiguous inquiry. > The accepted way to get an estimate how much disk space is still > available is statfs()/statvfs(), applications and admins rely on the > values it returns. You cannot just break that and think you can get away > with it. Sorry, this is a half empty vs half full problem. A solution won't be found by disregarding the other perspective; as a consequence to calling it broken, you're saying to not break it we can't have per subvolume or per file raid. And that's less acceptable than the original problem, which really is that some programs are making unacceptably vague and grandiose inquiries about free space availability. > > The thin provisioning folks need to find a way to make this work, not > userspace programmers. 99.9% of userspace programs are writing out pretty small files, at a rate that's fairly knowable. They are thus well behaved. A handful of applications want to know how much free space there is, as if the answer entitles them to use all or most of that free space, compared to some other program that asks at the same time? I think the expectation programs can get ballpark free space information for a volume was probably always unreasonable, it's just that thin provisioning is making this more clear. Most burden is on the user implementer who creates virtualsize LVs to not make them too big. And then I think there is some burden on programs to make more qualified inquiries for free space available. Chris Murphy -- devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxxx https://admin.fedoraproject.org/mailman/listinfo/devel Fedora Code of Conduct: http://fedoraproject.org/code-of-conduct