Arjan van de Ven wrote:
On Mon, May 02, 2005 at 03:19:10PM -0400, Peter Jones wrote:
On Mon, 2005-05-02 at 14:52 +0200, Hans de Goede wrote:
Hmm,
Lots of pyo pyc duplicates, this should be somehow fixed in python can
RPM handle hardlinks iow can an rpm contain a file and hardlink to the
file instead of 2 copies of the file?
If rpm can handle hardlinks then this should be fixable preferrably
python should just create a hardlink when the pyc and pyo are the same.
You could make rpm run "hardlinks" on a directory, but the results are
pretty painful -- you wind up running it per-package, and that means
everything takes forever.
or run it from cron.weekly on /usr/share/doc ;)
God please no. We already have too many cron jobs that turn a
machine into a slug.
I think it is 100% wrong to mark files as duplicates because
they are the same "now". There is no guarantee they will be
the same in a future update. Excluding a GPL license COPYING
file from one package and linking it to another central copy
fails the second someone decides to use GPLv3 for that package.
Or if they add text to the top of the file or something. It
is bad practice to be doing this with license files, for almost
zero gain.
A few months ago Warren scanned the entire OS to see what would
be gained if we were to do what is being proposed here. The
results were negligible.
Lets take 10 steps back and try to see the forest for a minute,
ignoring the trees for the time being. If the goal is to make
the distribution take up less space - lets focus on analyzing what
exactly is taking up the most space on the distribution after
install. Find the top 10 space consumers, and begin analyzing
how we might be able to reduce the space they're using.
I suspect solving that high-level problem will result in a disk
space savings 10-20 times any savings we might gain from hardlinking
GPL "COPYING" files.
Let's focus on the real problem rather than coming up with solutions
and then looking for problems we can solve with them.