Re: Re: using UID in DB

Nathan Rixham <nrixham@xxxxxxxxx> · Wed, 31 Mar 2010 19:01:56 +0100

Tommy Pham wrote:
> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham <nrixham@xxxxxxxxx> wrote:
>> Tommy Pham wrote:
>>> As for spatial data types, I've never find much use for non scientific
>>> related.  (example) If using point as a PK, if MySQL stores it the
>>> same way as PostgreSQL which is 16 bytes, how is that any different -
>>> performance wise - than using UUID and storing it as binary(16) for
>>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL?
>>>
>> it's all about the indexing (R-Tree)
>>
>> http://en.wikipedia.org/wiki/R-tree
>>
> 
> I can see where the performance would be between B-Tree vs R-Tree for
> the same field size but I've yet to see real life application of it.
> Case in point, if using point for GPS data coordinates, then wouldn't
> it still be a lot better to use Lon (float), Lat (float) which is 2
> fields of 4 bytes each vs 1 field of 16 bytes?  The index would still
> be faster (8 bytes less) in B-Tree right?  Not to mention smaller row
> & DB size.

wish I still had an application to point you at; and I can't assert in
any other way just how much faster it is; especially over large datasets
- regardless of the amount of rows the style of index doesn't slow
(spatial) queries down.

if index byte size is of importance then negate spatial indexes (which
are rather a lot bigger).

regardless though, it was just an idea I was throwing at you, not
suggesting it's the best approach, but worth consideration depending on
your priorities.

  As for calculating the distance between 2 'points', it
> would be simpler for querying purposes to use 'point'.  ATM, I don't
> have need to do any such calculations.  If I do use 'point' for PK,
> I'd run into problems with scaling and migration later.

the points don't matter tbh; or should i say specifying an x and a y
doesn't matter, because you can simply get by a single id, or get a
range of ids very quickly; the other value comes in to play when you
want temporal queries.

again though, i reiterate only useful if you want speed and don't care
about index bytesize - and certainly not forcing it down your neck.

re scaling.. unsure theoretically you have 15 digit precision which is a
mysql DOUBLE/REAL, and that's for each value in the pair, so an almost
incomprehensible number of rows are possible before a collision.

all in all: there are many good reasons why I've only used this on
occassion (specifically when dealing with 1 million+ rows in an RDBMS
table); and many more good reasons why i stick to non-rdbms databases
and use http uri's + eav model data only nowadays.

The latter i would advocate, the former not so unless you are stuck w/
mysql postgres - in which case you can't get faster. [1]

[1] get some numbers to prove when i have time.

Nathan

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php