Re: Duplicate Detection in an arbitrary set

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Tyler Earman wrote:
> Does anyone know of a semi-efficient way of detecting duplicates in a
> given set of data (specifically duplicate integers) other than just
> sorting the data into an array and then stepping through each element in
> the array?
> 
> Preferably a solution wouldn't involve something implementation specific
> (like loading the array into a different data type and measuring the
> differences between the two, saw someone doing that with Java ArrayLists
> and HashSets) but rather some sort of mathematically correct method of
> doing it.

I don't believe there's any way of doing this that is more efficient than
a hash table.  An informal proof of this is that you have to store each
element somewhere, and you only check each element once for duplicates,
given a hash table sufficiently large to make collisions improbable.
The time to insert one element into a hash table is ~ O(1).  So, the runtime
for detecting duplicates in a bag of N elements is ~ O(n).

Andrew.

[Index of Archives]     [Linux C Programming]     [Linux Kernel]     [eCos]     [Fedora Development]     [Fedora Announce]     [Autoconf]     [The DWARVES Debugging Tools]     [Yosemite Campsites]     [Yosemite News]     [Linux GCC]

  Powered by Linux