Re: Duplicated files in the pristine FC4t2 installation

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> But I think the whole problem is silly as well, FWIW.

When Warren brought this up on IRC a while back, I wrote the following
script and rand it on a rawhide everything install.  This fails to take
into account files that are already hardlinked, so and its results might
well be significantly inflated.  (Someone who cares could hack it further
to check installed names of a duplicate file for being the same inode.)

Total 408578931 bytes in 43107 inodes

That's a max of < 400M on an install that is something 8.5-9G.
So the issue is worth at most on the order of 5% of disk space,
and that is probably a very high estimate.


rpm -qa --qf '[%{FILEMD5S}  %{FILENAMES} %{FILESIZES} %{SOURCERPM}\n]' |
awk '
NF < 4 { next } # directory
{
  md5_name[$1] = $2;
  md5_srpm[$1] = $4;
  info = $2 " " $4;
  if ($1 in sizes) {
    if ($3 != sizes[$1]) print "!!!", $1 ":", info, "VS", md5[info]
  } else {
    sizes[$1] = $3;
  }
  if ($1 in md5) {
    if (info == md5[$1]) next;
    for (i = 1; i < dups[$1]; ++i)
      if (dupinfo[$1 "," i] == info)
        next;
    dups[$1]++;
    dupinfo[$1 "," dups[$1]] = info;
  } else {
    md5[$1] = info;
  }
}
END {
  dupsize = dupcount = 0;
  for (sum in dups) {
    n = dups[sum];
    dupcount += n;
    dupsize += n * sizes[sum];
    print n, "dups:", sum, " ==> ", (n * sizes[sum]);
    print "\t" md5[sum];
    for (i = 1; i <= n; ++i)
      print "\t" dupinfo[sum "," i];
  }
  print "Total", dupsize, "bytes in", dupcount, "inodes";
}
'


[Index of Archives]     [Fedora Users]     [Fedora Development]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite News]     [KDE Users]

  Powered by Linux