Re: Which hashing algorithm is best to check file duplicity?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Martin Zvarík wrote:
I want to store the file's hash to the database, so I can check next time to see if that file was already uploaded (even if it was renamed).

What would be the best (= fastest + small chance of collision) algorithm in this case?

"Fastest" depends mostly on the size of the file, not the algorithm used. A 2gig file will take a while using md5 as it will using sha1.

Using md5 will be slightly quicker than sha1 because generates a shorter hash so the trade-off is up to you.

$ ls -lh file.gz

724M 2008-07-28 10:02 file.gz

$ time sha1sum file.gz
4ae7bd1e79088a3e3849e17c7be989d4a7c97450  file.gz

real	0m3.398s
user	0m3.056s
sys	0m0.336s

$ time md5sum file.gz
16cff7b95bcb5971daf1cabee6ca4edd  file.gz

real	0m2.091s
user	0m1.744s
sys	0m0.328s

$ time sha1sum file.gz
4ae7bd1e79088a3e3849e17c7be989d4a7c97450  file.gz

real	0m3.332s
user	0m2.988s
sys	0m0.344s

$ time md5sum file.gz
16cff7b95bcb5971daf1cabee6ca4edd  file.gz

real	0m2.136s
user	0m1.776s
sys	0m0.348s

--
Postgresql & php tutorials
http://www.designmagick.com/


--
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php


[Index of Archives]     [PHP Home]     [Apache Users]     [PHP on Windows]     [Kernel Newbies]     [PHP Install]     [PHP Classes]     [Pear]     [Postgresql]     [Postgresql PHP]     [PHP on Windows]     [PHP Database Programming]     [PHP SOAP]

  Powered by Linux