Re: Eliminating duplicate photos

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Mon, Sep 29, 2008 at 02:09:05PM -0430, Patrick O'Callaghan wrote:
> On Mon, 2008-09-29 at 14:00 -0400, Trapper wrote:
> > Itamar - IspBrasil wrote:
> > > create a list of md5 of all files,
> > >
> > > with md5 you will find duplicated files.
> > >
> > > On 9/29/2008 9:04 AM, Timothy Murphy wrote:
> > >> What is the best way of eliminating duplicate photos
> > >> on a number of machines, all running Fedora or CentOS?
> > >>
> > >> I suppose one could ask the same question about files generally;
> > >> how to tag or delete duplicates.
> > >>
> > >>    
> > I have a problem similar to Timothy's. If I run "md5sum *" on a folder, 
> > in a terminal,  it lists all the sums. My problem is that I have several 
> > thousand files. Is there some way I can output the results to a text 
> > file? Can't copy and paste unless there's some way for me to adjust the 
> > terminal to allow the last several thousand lines to display. Then I'm 
> > also going to have to sort all those lines into some alphabetical order 
> > to reasonably detect duplicate sums. Any ideas?
> 
> You're using Linux here. Anything that outputs text to a terminal can
> send it to a file or to another program. You need to read up on Shell
> redirection and filters, e.g.:
> 
> md5sum * > sums
> 
> or
> 
> md5sum * | sort > sorted_sums
> 

The below script is not very general but can be edited to 
your need.   The SIZER value is to make it easy to find lumpy
things like duplicate ISO images.   The odd md5sum value 
pops up often for interesting reasons and is excluded.

============================================================
#!  /bin/bash
# Copyright (C) 1985-2008 by Tom Mitchell 
#
# This program is free software, licensed under the GNU GPL, >=2.0. http://www.gnu.org/.
# This software comes with absolutely NO WARRANTY. Use at your own risk!
#
#SIZER=' -size +10240k'
SIZER=' -size +0'
#
DIRLIST=". "
find $DIRLIST  -type f $SIZER -print0 | xargs -0 md5sum |\
	egrep -v "d41d8cd98f00b204e9800998ecf8427e|LemonGrassWigs" |\
sort > /tmp/looking4duplicates
tput bel; sleep 2
tput bel; sleep 2
tput bel; sleep 2
cat /tmp/looking4duplicates |  uniq --check-chars=32 --all-repeated=prepend | less


-- 
	T o m  M i t c h e l l 
	Found me a new hat, now what?

-- 
fedora-list mailing list
fedora-list@xxxxxxxxxx
To unsubscribe: https://www.redhat.com/mailman/listinfo/fedora-list
Guidelines: http://fedoraproject.org/wiki/Communicate/MailingListGuidelines
[Index of Archives]     [Older Fedora Users]     [Fedora Announce]     [Fedora Package Announce]     [EPEL Announce]     [Fedora Magazine]     [Fedora News]     [Fedora Summer Coding]     [Fedora Laptop]     [Fedora Cloud]     [Fedora Advisory Board]     [Fedora Education]     [Fedora Security]     [Fedora Scitech]     [Fedora Robotics]     [Fedora Maintainers]     [Fedora Infrastructure]     [Fedora Websites]     [Anaconda Devel]     [Fedora Devel Java]     [Fedora Legacy]     [Fedora Desktop]     [Fedora Fonts]     [ATA RAID]     [Fedora Marketing]     [Fedora Management Tools]     [Fedora Mentors]     [SSH]     [Fedora Package Review]     [Fedora R Devel]     [Fedora PHP Devel]     [Kickstart]     [Fedora Music]     [Fedora Packaging]     [Centos]     [Fedora SELinux]     [Fedora Legal]     [Fedora Kernel]     [Fedora OCaml]     [Coolkey]     [Virtualization Tools]     [ET Management Tools]     [Yum Users]     [Tux]     [Yosemite News]     [Gnome Users]     [KDE Users]     [Fedora Art]     [Fedora Docs]     [Asterisk PBX]     [Fedora Sparc]     [Fedora Universal Network Connector]     [Libvirt Users]     [Fedora ARM]

  Powered by Linux