Tommy Pham wrote: > On Wed, Mar 31, 2010 at 11:01 AM, Nathan Rixham <nrixham@xxxxxxxxx> wrote: >> Tommy Pham wrote: >>> On Wed, Mar 31, 2010 at 4:06 AM, Nathan Rixham <nrixham@xxxxxxxxx> wrote: >>>> Tommy Pham wrote: >>>>> As for spatial data types, I've never find much use for non scientific >>>>> related. (example) If using point as a PK, if MySQL stores it the >>>>> same way as PostgreSQL which is 16 bytes, how is that any different - >>>>> performance wise - than using UUID and storing it as binary(16) for >>>>> MySQL or uniqueidentifier (16 bytes) for PostgreSQL? >>>>> >>>> it's all about the indexing (R-Tree) >>>> >>>> http://en.wikipedia.org/wiki/R-tree >>>> >>> I can see where the performance would be between B-Tree vs R-Tree for >>> the same field size but I've yet to see real life application of it. >>> Case in point, if using point for GPS data coordinates, then wouldn't >>> it still be a lot better to use Lon (float), Lat (float) which is 2 >>> fields of 4 bytes each vs 1 field of 16 bytes? The index would still >>> be faster (8 bytes less) in B-Tree right? Not to mention smaller row >>> & DB size. >> wish I still had an application to point you at; and I can't assert in >> any other way just how much faster it is; especially over large datasets >> - regardless of the amount of rows the style of index doesn't slow >> (spatial) queries down. > > If I see the PoC w/o having to see the actual setup, then I would > probably understand your point better. > >> if index byte size is of importance then negate spatial indexes (which >> are rather a lot bigger). >> > > Since the spatial indexes are bigger, wouldn't that work against it > due disk access having to read more? depends on how seriously you take a solution tbh; if you just wanna run a huge dataset on a sata drive w/ 2gb ram then your screwed whatever approach (unless you turn the dataset in to many fine grained resources, heavily cached and only accessed on a need to have basis) but even then there are many limitations. one approach is to throw more ram at a situation; where you are definitely better loading a huge dataset in to one box with massive amounts of ram than splitting it over multiple boxes. [1] http://rickonrails.wordpress.com/2009/03/30/big-ole-mysql-spatial-table-optimization-tricks/ if you're really serious then you want to be on a box with a tonne of ram and pci express solid state storage devices which give you throughput of 700MB/s and higher; prices aren't that bad comparatively and one machine w/ a $800 card added works much better than 2-3 servers (although you still need to cover replication and high availability). http://www.fusionio.com/ioxtreme/ wandering off course a bit maybe; but these are situations we need to consider. I don't know how big you think big is; but realistically aws ec2 instances coupled with various "big data" / nosql - non rdbms approaches are the way to go (see many many many recent discussions on the subject around the net) >> regardless though, it was just an idea I was throwing at you, not >> suggesting it's the best approach, but worth consideration depending on >> your priorities. >> > > Thanks for your idea. I'd appreciate it very much. I'm told that I > over analyze things at times ... lol ... so please don't take it the > wrong way :) ty for clarifying, and deal - I'm the same :) >> As for calculating the distance between 2 'points', it >>> would be simpler for querying purposes to use 'point'. ATM, I don't >>> have need to do any such calculations. If I do use 'point' for PK, >>> I'd run into problems with scaling and migration later. >> the points don't matter tbh; or should i say specifying an x and a y >> doesn't matter, because you can simply get by a single id, or get a >> range of ids very quickly; the other value comes in to play when you >> want temporal queries. >> >> again though, i reiterate only useful if you want speed and don't care >> about index bytesize - and certainly not forcing it down your neck. >> >> re scaling.. unsure theoretically you have 15 digit precision which is a >> mysql DOUBLE/REAL, and that's for each value in the pair, so an almost >> incomprehensible number of rows are possible before a collision. >> >> all in all: there are many good reasons why I've only used this on >> occassion (specifically when dealing with 1 million+ rows in an RDBMS >> table); and many more good reasons why i stick to non-rdbms databases >> and use http uri's + eav model data only nowadays. >> >> The latter i would advocate, the former not so unless you are stuck w/ >> mysql postgres - in which case you can't get faster. [1] >> >> [1] get some numbers to prove when i have time. >> >> Nathan >> > > Using HTTP URI's + EAV model is something I'll definitely look into. > good good; i can assure you it's the future; all the major corps and governments are moving to this (age-old) model under the banner of linked data, odata, and partially gdata - and meanwhile the development community is slowly getting there with focus moving to kv databases which are just one step away from the full on eav / triple approach. Many regards, Nathan -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php