On 08/06/14 18:36, Eric Blake wrote: > Adam Litke has been asking if I can expose watermark information from\ <bikeshedding> I'd be glad if we stopped calling this watermark. The wiki disambiguation article states: <citation> A watermark is a recognizable image or pattern in paper used to identify authenticity. Watermark or watermarking can also refer to: In digital watermarks and digital security[edit] Watermark (data file), a method for ensuring data integrity which combines aspects of data hashing and digital watermarking Watermark (data synchronization), directory synchronization related programming terminology High-water mark (computer security), network security terminology Audio watermark, techniques for detecting hidden information from watermarked signal Digital watermarking, a technique to embed data in digital audio, images or video Watermarking attack, an attack on disk encryption methods </citation> As this usage is neither of those I always have to translate it to something more sane when discussing this topic. I actually like the subject of this mail to refer to what's discussed here. I'm not sure though if we can come up with a shorter name that will not be ambiguous with something else. </bikeshedding> > qemu when doing block commit. Qemu still doesn't expose that > information when doing 'virsh blockcopy' (QMP drive-mirror), but DOES > expose it for regular and active 'virsh blockcommit'. The idea is that > when you are writing to more than one file at a time, management needs > to know if the file is nearing a watermark for usage that necessitates > growing the storage volume before hitting an ENOSPC error. In > particular, Adam's use is running qcow2 format on top of block devices, > where it is easy to enlarge the block device. > > The current libvirt API virDomainBlockInfo() can only get watermark > information for the active image in a disk chain. It shows three numbers: > capacity: the disk size seen by the guest (can be grown via > virt-resize) - usually larger than the host block device if the guest > has not used the complete disk, but can also be smaller than the host > block device due to overhead of qcow2 and the disk is mostly in use > allocation: the known usage of the host file/block device, should never > be larger than the physical size (other than rounding up to file sector > sizing). For sparse files, this number is smaller than total size based > by the amount of holes in the file. For block devices with qcow2 format, > this number is reported by qemu as the maximum offset in use by the > qcow2 file (without regards to whether earlier offsets are holes that > could be reused). Compare this to what 'du' would report. > physical: the total size of the host file/block device. Compare this to > what 'ls' would report. > > Also, the libvirt API virStorageVolGetXMLDesc reports two of those > numbers for a top-level image: <capacity> and <allocation> are listed as > siblings of <target>. But it is not present for a <backingStore>; you > have to use the API twice. > > Now that we have a common virStorageSourcePtr type in the C code, we > could do a better job of exposing full information for the entire chain > in a single API call. > > I've got a couple ideas of where we can extend existing APIs (and the > extensions do not involve bumping the .so versioning, so it can also be > backported, although it gets MUCH harder to backport without > virStorageSourcePtr). > > First, I think the virStorageVolGetXMLDesc should show all three > numbers, by adding a <physical unit='bytes'>...</physical> element > alongside the existing <capacity> and <allocation> elements. Also, I > think it might be nice if we could enhance the API to do a full chain > recursion (probably requires an explicit flag to turn on) where it shows > details on the full backing chain, rather than just partial details on > the immediate backing file; in doing that, the <backingStore> element > would gain recursive <backingStore> (similar to what we recently did in > <domain> XML). In that mode, each layer of <backingStore> would also > report <capacity>, <allocation>, and <physical>. Something like: While this is certainly a improvement to the storage volume API, it will not help Adam much as oVirt isn't actually using the storage driver. > > # virsh vol-dumpxml --pool default f20.snap2 > <volume type='file'> ... > > Also, the current storage volume API is rather hard-coded to assume that > backing elements are in the same storage pool, which is not always true. > It may be time to introduce <backingStore type='file'> or <backingStore > type='network'> to allow better details of cross-pool backing elements, > while leaving plain <backingStore> as a back-compat synonym for > <backingStore type='volume'> for the current hard-coded layout that > assumes the backing element is in the same storage pool. That would certainly improve the usability, but as said it would not help oVirt that much. > > The other idea I've had is to expand the <domain> XML to expose more > information about backing chains, including to make it expose details > that are redundant with virDomainBlockInfo() for the top level, or maybe > even what virDomainBlockStatsFlags() reports. Here, we have a bit of a > choice - storage volume XML was inconsistent on which attributes were > siblings to <target> (such as <capacity>) vs. children (such as > <timestamps>); it might be nicer to stick all per-file elements at the > same level in <disk> XML (probably as siblings to <source>). On the > other hand, I strongly feel that <compat> is a feature of the <format>, > so it should have been a child rather than a sibling. So, as an example > of what the XML might look like: > > <disk type='file' device='disk'> > <driver name='qemu' type='qcow2'> > <compat>1.1</compat> > <features/> > </driver> > <source file='/tmp/snap2.img'/> > <capacity unit='bytes'>12884901888</capacity> > <allocation unit='bytes'>2503548928</allocation> > <physical unit='bytes'>2503548928</allocation> > <permissions> > <mode>0600</mode> > <owner>107</owner> > <group>107</group> > <label>system_u:object_r:virt_content_t:s0</label> > </permissions> > <timestamps> > <atime>1407295598.623411816</atime> > <mtime>1402005765.810488875</mtime> > <ctime>1404318523.313955796</ctime> > </timestamps> Both <permissions> and <timestamps> are not entirely useful information in runtime. > <backingStore type='file' index='1'> > <format type='qcow2'> > <compat>1.1</compat> > <features/> > </format> > <source file='/tmp/snap1.img'/> > <capacity unit='bytes'>12884901888</capacity> > <allocation unit='bytes'>2503548928</allocation> > <physical unit='bytes'>2503548928</allocation> > <permissions> > <mode>0600</mode> > <owner>0</owner> > <group>0</group> > <label>system_u:object_r:virt_image_t:s0</label> > </permissions> > <timestamps> > <atime>1407295598.583411967</atime> > <mtime>1403064822.622766566</mtime> > <ctime>1404318525.899951254</ctime> > </timestamps> > <backingStore type='file' index='2'> > <format type='raw'/> > <capacity unit='bytes'>10737418240</capacity> > <allocation unit='bytes'>2503548928</allocation> > <physical unit='bytes'>10737418240</allocation> > <source file='/tmp/base.img'/> > <permissions> > <mode>0600</mode> > <owner>107</owner> > <group>107</group> > <label>system_u:object_r:virt_content_t:s0</label> > </permissions> > <timestamps> > <atime>1407295598.623411816</atime> > <mtime>1402005765.810488875</mtime> > <ctime>1404318523.313955796</ctime> > </timestamps> > <backingStore/> > </backingStore> > </backingStore> > <target dev='vda' bus='virtio'/> > <alias name='virtio-disk0'/> > <address type='pci' domain='0x0000' bus='0x00' slot='0x03' > function='0x0'/> > </disk> > > Again, this is a lot of new information, so it may be wise to add a new > flag that must be turned on to request the information. But adding this This definitely needs a flag. We are polluting the XML enough by the backing chain now. > information would allow watermark tracking for a blockcommit operation - > when collapsing 'base <- snap1 <- snap2' into 'base <- snap2' by > committing snap1 into base, the <allocation> sublement of the > appropriate <backingStore> level will do live tracking of the qemu > values as more data is being written into base, and thus be usable to > determine if the block device behind base needs to be externally > expanded before hitting an ENOSPC situation. > Peter
Attachment:
signature.asc
Description: OpenPGP digital signature
-- libvir-list mailing list libvir-list@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/libvir-list