On Wed, Mar 31, 2010 at 2:57 AM, Nathan Rixham <nrixham@xxxxxxxxx> wrote: > Tommy Pham wrote: >> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham <nrixham@xxxxxxxxx> wrote: >> >>> nope never been able to find any significant advantage; and thus ended >>> up using http uri's in my own domain space(s) which are always >>> guaranteed to be unique as I'm issuing them. bonus is that they can be >>> dereferenced and server as both a universal (resource) identifier and a >>> locater. >>> >>> ps: resource = anything that can be named. >>> >> >> Hi Nathan, >> >> I'm interested in hearing your technique of generating your own uuid >> if you don't mind sharing :). I'm building a project to test my idea >> about search engine and testing of different RDBMSes. So naturally, >> it (the app) would crawl the net and I'd have over a 1 billion rows. >> >> Thanks, >> Tommy > > Hi Tommy, > > Always good to see somebody experimenting and questioning things :) > > With regards generating UUID's which are http schema uri's; this is > something I got hooked up about early on, but then with practise > realised much of the worlds data already has globally known and used > http scheme identifiers; for instance if I'm talking about a web page > then it's the URL for it; a user may as well be http://twitter.com/webr3 > a country could be http://dbpedia.org/resource/United_Kingdom in the > rare occurrence where i actually need to create an identifier then > anything from a freebase style GUID (http://example.org/GUID-HERE) > through to a generated meaningful URI http://mydomain.com/user/username > /project/project-name or even just strapped to a class + microtime: > $uri = 'http://mydomain.com/__CLASS__/' . microtime(true); > > There are milions of approaches; but it's worth noting that with each > you can have extra functionality due to the identifier and locator > duality of http scheme uri's (thanks to the domain name system). > Very interesting approach. I'll have to think and research more into it. > With regards what you are doing, if I may suggest a few things that you > could try: > > You can create Identifiers that are spatial POINT()s and store them in > mysql/postgres using either the MySQL spatial extension or PostGIS > respectively. You can create identifiers using something like POINT( > timestamp, float-id ) which again serves a duality of timestamping each > record and identifying it. Moreover you'll be shocked at the speed gains > from spatial indexing (seriously amazing), and further it allows you to > do some pretty cool functionality with amazing speed. > > The spatial indexing lets you leverage your information in some pretty > cool ways, at phenomenal speed. Because your data is essentially now > points in a virtual world where X is time and Y is identity, you can > pull information out by drawing MBRs around the data and thus selecting > say all records between timestampA and timestampB with identities in the > range 0-1832.1234 (we use floats rather than ints, far more scope and > lends to great spatial optimisation / boxing). Further you aren't > limited to basic geometries; you can create chains of data using > linestring, test intersections on time, disjunctions and much more; > again all with shocking speed over even the biggest of data sets (many > billions in under 0.001s). > > You may also want to test out some non relational databases; as > typically with large datasets you have to remove all the relational > parts of the database (foreign keys etc) just to be able to run the > thing efficiently. There are many kv db's; nosql solutions and my > personal favourites which are quad/triple stores for EAV modeled data. > > Taking an datachanging approach and working with + storing all data as > EAV triples is by far the fastest and most efficient way to make both > small and large sites; everything is stored in a single flat "table" and > you can query across all your data with great speed and chain queries > together linking up id's to access your data in ways you can't even > imagine ;) personally I'm running triple stores with 3-4 billion rows on > many machines, even on my desktop! > > I'll leave it there, but something to get you started.. > > Regards! > As for spatial data types, I've never find much use for non scientific related. (example) If using point as a PK, if MySQL stores it the same way as PostgreSQL which is 16 bytes, how is that any different - performance wise - than using UUID and storing it as binary(16) for MySQL or uniqueidentifier (16 bytes) for PostgreSQL? Thanks, Tommy -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php