Angus Mann wrote:
---- Original Message -----
From: Ashley Sheridan
To: Angus Mann
Cc: php-general@xxxxxxxxxxxxx
Sent: Friday, November 13, 2009 8:31 AM
Subject: Re: uniqid() and repetition of numbers generated
On Fri, 2009-11-13 at 08:22 +1000, Angus Mann wrote:
Hi all. I'm sure I can't be the first person to ask this question but a search of the net leaves me confused.
I need a unique identifier in an SQL table and for complicated reasons I don't want to use auto-increment.
So I thought I would use a pseudo-random method instead. I am NOT scared of people guessing the unique identifier, it just has to be unique in order for the database to work properly.
So I looked at the uniqid() function and see it is based on the "current time in microseconds" and when I test it out I see that it increments (very quickly) when run repeatedly.
If it is based on JUST the time, then it should repeat every 24 hours, thus making "collisions" possible, which I don't want.
If it is based on the time AND day, then that's fine....I can use it.
So here's the problem....
When I calculate the number of microseconds since 1970 I get a 16 digit number.
But uniqid() only gives a 13 digit number.
Calculating the number of microseconds in a day gives 11 digits.
So it seems to me that the numbering sequence will repeat every 100 days, which risks collisions also.
Can someone explain how uniqid() is really calculated, so I can make a proper judgement about how to use it?
Please don't suggest using a hash of a number generated by uniqid(). Hashing a small number into a longer one does not add entropy, it just transforms the input number, so it does NOT alter the risk of collisions so there is no net advantage.
I had a thought to just append the current date to the uniqid() result but I'm interested to know if anyone has a more elegant solution.
Thanks in advance.
Angus
Auto increment fields are designed to avoid collisions. I can't think of any sensible reason for not using them. If you're worried that users of the system will think a number like '65' is a 'silly' value for an id, why not pad it up with leading zeros, and maybe add in some text from their name or something. To me, one unique number is the same as another, whether it has 11 digits or 2. Also, without having numbers with many leading zeros in your 11-digit unique number, the value range will be dramatically reduced, thereby increasing the chance of you running out of unique values.
Thanks,
Ash
http://www.ashleysheridan.co.uk
Thanks Ashley. To clarify, the reason I don't want to use auto-increment : different users with their own populated databases may wish to merge some or all of their data. The unique identifier needs to be carried along with the rest of the data, hence be unique not only on the database it currently resides in ... it still needs to be unique if it gets copied into another person's database, and auto-increment will not meet that requirement. I thought that using microtime (hence uniqid()) will solve the problem, and the only chance of a collision is the unlikely event that by chance, records are added to 2 different people's databases at EXACTLY the same time, to within an accuracy of a millionth of a second. Possible I realize, but very unlikely, given that each user will probably add less than 100 entries per day.
On balance I think I will generate an identifier consisting of a few things...uniqid() plus a a few letters from the person's name plus a (pseudo)random 3 digit number. Probably there's enough entropy in that for my purpose.
But the question still remains....what exactly is being returned by uniqid() ? It is obviously not random, and not a hash function because it increments predictably. It's too short to be the number of microseconds since 1970 and too long to be the number of microseconds since midnight. Since it has a fixed length, and it increments, it will eventually get to the last possible number - when will that be, and what will happen - will an extra digit appear or will it go back to zero, or will the generating algorithm crash?
If it's anything similar to the unix timestamp then we're all in trouble on January 19, 2038 !
Here's part of the confusion:
If you were to express the number of microseconds since 1970 in a
decimal number, it would indeed take 16 digits.
But uniqid() returns a /13 character string/, not a 13 digit number. The
string is actually a hexadecimal number (and thus can express a greater
range of values than a decimal number within those 13 characters).
-John