Tommy Pham wrote: > On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham <nrixham@xxxxxxxxx> wrote: > >> nope never been able to find any significant advantage; and thus ended >> up using http uri's in my own domain space(s) which are always >> guaranteed to be unique as I'm issuing them. bonus is that they can be >> dereferenced and server as both a universal (resource) identifier and a >> locater. >> >> ps: resource = anything that can be named. >> > > Hi Nathan, > > I'm interested in hearing your technique of generating your own uuid > if you don't mind sharing :). I'm building a project to test my idea > about search engine and testing of different RDBMSes. So naturally, > it (the app) would crawl the net and I'd have over a 1 billion rows. > > Thanks, > Tommy Hi Tommy, Always good to see somebody experimenting and questioning things :) With regards generating UUID's which are http schema uri's; this is something I got hooked up about early on, but then with practise realised much of the worlds data already has globally known and used http scheme identifiers; for instance if I'm talking about a web page then it's the URL for it; a user may as well be http://twitter.com/webr3 a country could be http://dbpedia.org/resource/United_Kingdom in the rare occurrence where i actually need to create an identifier then anything from a freebase style GUID (http://example.org/GUID-HERE) through to a generated meaningful URI http://mydomain.com/user/username /project/project-name or even just strapped to a class + microtime: $uri = 'http://mydomain.com/__CLASS__/' . microtime(true); There are milions of approaches; but it's worth noting that with each you can have extra functionality due to the identifier and locator duality of http scheme uri's (thanks to the domain name system). With regards what you are doing, if I may suggest a few things that you could try: You can create Identifiers that are spatial POINT()s and store them in mysql/postgres using either the MySQL spatial extension or PostGIS respectively. You can create identifiers using something like POINT( timestamp, float-id ) which again serves a duality of timestamping each record and identifying it. Moreover you'll be shocked at the speed gains from spatial indexing (seriously amazing), and further it allows you to do some pretty cool functionality with amazing speed. The spatial indexing lets you leverage your information in some pretty cool ways, at phenomenal speed. Because your data is essentially now points in a virtual world where X is time and Y is identity, you can pull information out by drawing MBRs around the data and thus selecting say all records between timestampA and timestampB with identities in the range 0-1832.1234 (we use floats rather than ints, far more scope and lends to great spatial optimisation / boxing). Further you aren't limited to basic geometries; you can create chains of data using linestring, test intersections on time, disjunctions and much more; again all with shocking speed over even the biggest of data sets (many billions in under 0.001s). You may also want to test out some non relational databases; as typically with large datasets you have to remove all the relational parts of the database (foreign keys etc) just to be able to run the thing efficiently. There are many kv db's; nosql solutions and my personal favourites which are quad/triple stores for EAV modeled data. Taking an datachanging approach and working with + storing all data as EAV triples is by far the fastest and most efficient way to make both small and large sites; everything is stored in a single flat "table" and you can query across all your data with great speed and chain queries together linking up id's to access your data in ways you can't even imagine ;) personally I'm running triple stores with 3-4 billion rows on many machines, even on my desktop! I'll leave it there, but something to get you started.. Regards! -- PHP General Mailing List (http://www.php.net/) To unsubscribe, visit: http://www.php.net/unsub.php