Colin,
This is with all sincerity and not at all meant to be dismissive: If
you have the time to automate this I think we'll be glad to listen.
There are a lot of components to automation of this kind of thing; in fact people have made entire companies and product lines around (as far as I can tell) essentially this problem: http://www.blackducksoftware.com/
However, I was fairly sure there had to already be something open source out there to use as a start. My initial googling wasn't too successful (a lot of things called licenses), but then I had the bright idea to add "Debian" to my search. Turns out there's a license analyzing script in one of their packages:
http://packages.debian.org/unstable/devel/devscripts
There is also: http://www.murrayc.com/blog/permalink/2006/10/04/debian-repository-analyzer-for-license-compliance/
Which looks kind of frightening but maybe useful.
The Debian script supports far fewer licenses the Fedora wiki page on this topic; however, it would probably be pretty useful to run over the whole source tree as a start; I bet you'd find a number of cases where things today are specified just as GPL but have some other stuff.
Moving more advanced from that, associate the wiki license list set with a list of fuzzy text segments combined with regular expressions.
-- Fedora-maintainers mailing list Fedora-maintainers@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-maintainers
-- Fedora-maintainers-readonly mailing list Fedora-maintainers-readonly@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/fedora-maintainers-readonly