On Thu, Feb 23, 2012 at 12:01 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: > On Thu, Feb 23, 2012 at 01:44:35AM +0200, Zeeshan Ali (Khattak) wrote: >> On Thu, Feb 23, 2012 at 1:25 AM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: >> > On Thu, Feb 23, 2012 at 12:44:15AM +0200, Zeeshan Ali (Khattak) wrote: >> >> On Tue, Feb 21, 2012 at 7:46 PM, Daniel P. Berrange <berrange@xxxxxxxxxx> wrote: >> >> > From: "Daniel P. Berrange" <berrange@xxxxxxxxxx> >> >> >> >> I really should have paid more attention before ACKing some of the >> >> patches. :) Here is the problem introduced by this one: >> >> >> >> > --- >> >> > data/oses/ubuntu.xml | 731 ++++++++++++++++++++++++++++++++++++++++++++++++-- >> >> > 1 files changed, 711 insertions(+), 20 deletions(-) >> >> > >> >> > diff --git a/data/oses/ubuntu.xml b/data/oses/ubuntu.xml >> >> > index 78239f6..fb331a3 100644 >> >> > --- a/data/oses/ubuntu.xml >> >> > +++ b/data/oses/ubuntu.xml >> >> > @@ -6,6 +6,35 @@ >> >> > <version>4.10</version> >> >> > <vendor>Canonical</vendor> >> >> > <family>Linux</family> >> >> > + >> >> > + <media arch="i386"> >> >> > + <url>http://old-releases.ubuntu.com/releases/warty/warty-release-install-i386.iso</url> >> >> > + <iso> >> >> > + <volume-id>Ubuntu 4.10 i386</volume-id> >> >> > + <system-id>LINUX</system-id> >> >> > + </iso> >> >> > + <kernel>casper/vmlinuz</kernel> >> >> > + <initrd>casper/initrd.img</initrd> >> >> > + </media> >> >> .. >> >> > + >> >> > + <media arch="i386" live="true" installer="false"> >> >> > + <url>http://old-releases.ubuntu.com/releases/warty/warty-release-live-i386.iso</url> >> >> > + <iso> >> >> > + <volume-id>Ubuntu 4.10 i386</volume-id> >> >> > + <system-id>LINUX</system-id> >> >> > + </iso> >> >> > + <kernel>casper/vmlinuz</kernel> >> >> > + <initrd>casper/initrd.img</initrd> >> >> > + </media> >> >> >> >> With volume and system ID being identical for both medias, only the >> >> first media will be matched against both types of medias. Same goes >> >> for other medias you added in this patch. Wonder what could be a >> >> solution here though. Perhaps detection code should check if there is >> >> multiple matches and it there are, match basename of known URL against >> >> that of media's patch? >> > >> > Yep, the name is about the only difference between them - all the ISO >> > metadata is identical. The other thing I considered is to add an MD5 >> > checksum of the ISO file, which would trivially distinguish them, >> > but I fear that would make it too unique. >> >> That might be a very good idea actually (Christophe was meaning to do >> that any ways?) but calculation of md5 takes a very long time: >> >> $ time md5sum ~/ISOs/Fedora-16-x86_64-DVD.iso >> bb38ea1fe4b2fc69e7a6e15cf1c69c91 /home/zeenix/ISOs/Fedora-16-x86_64-DVD.iso >> >> real 0m23.348s >> user 0m9.276s >> sys 0m1.654s >> >> so I'll recommend to restrict it to first N bytes, where we could >> think/discuss the value of N but 1MiB should be good enough and takes >> very little time: >> >> $ time dd if=~/ISOs/Fedora-16-x86_64-DVD.iso bs=1M count=1 2> /dev/null |md5sum >> e25ea147176f24239d38a46f501bd25e - >> >> real 0m0.011s >> user 0m0.004s >> sys 0m0.008s >> >> Actually if we start using md5sum for detection, we can pretty much >> remove all other detection logic/extraction. > > Actually I really wouldn't want to remove the other logic - I think > that a checksum could well be too strict, eliminating matches that > should otherwise be allowed. I'd prefer just to use it to resolve > ambiguity when we have multiple matches. A 1 MB checksum would very > likely be sufficient here. > > I think we should also have a full checksum in the database to, so > that if an app downloads a remote ISO based on an osinfo URL, it can > verify the download is correct Believe it or not, I had the same exact thought yesterday after sending my previous mail but I didn't bother to get up from bed to reply again thinking that Daniel will probably come to the same conclusion. :) > Some remote download sites actually provide checksum files, so it > might be desirable to have either > > <checksum url="http:/...."/> > <checksum value="XXXXXXXXXXXXXX"/> > > And once we do this kind of thing, we might also consider having a > GPG signature URL too, for sites that publish those. Didn't think of these but yeah, sounds good except that I think we could do with just one checksum element per ISO: <checksum url="http:/...."/>XXXXXX</checksum> where "XXXXXX" code could just be "" if url is provided. Then we can also add an attribute for md5 of first 1MiB: <checksum url="http:/...." first-5-mib="YYYYY">XXXXXX</checksum> Open to suggestion about better name for 'first-5-mib' here. -- Regards, Zeeshan Ali (Khattak) FSF member#5124