Re: Re: using UID in DB

Tommy Pham <tommyhp2@xxxxxxxxx> · Wed, 31 Mar 2010 04:01:25 -0700

On Wed, Mar 31, 2010 at 2:57 AM, Nathan Rixham <nrixham@xxxxxxxxx> wrote:
> Tommy Pham wrote:
>> On Tue, Mar 30, 2010 at 2:27 PM, Nathan Rixham <nrixham@xxxxxxxxx> wrote:
>>
>>> nope never been able to find any significant advantage; and thus ended
>>> up using http uri's in my own domain space(s) which are always
>>> guaranteed to be unique as I'm issuing them. bonus is that they can be
>>> dereferenced and server as both a universal (resource) identifier and a
>>> locater.
>>>
>>> ps: resource = anything that can be named.
>>>
>>
>> Hi Nathan,
>>
>> I'm interested in hearing your technique of generating your own uuid
>> if you don't mind sharing :).  I'm building a project to test my idea
>> about search engine and testing of different RDBMSes.  So naturally,
>> it (the app) would crawl the net and I'd have over a 1 billion rows.
>>
>> Thanks,
>> Tommy
>
> Hi Tommy,
>
> Always good to see somebody experimenting and questioning things :)
>
> With regards generating UUID's which are http schema uri's; this is
> something I got hooked up about early on, but then with practise
> realised much of the worlds data already has globally known and used
> http scheme identifiers; for instance if I'm talking about a web page
> then it's the URL for it; a user may as well be http://twitter.com/webr3
> a country could be http://dbpedia.org/resource/United_Kingdom in the
> rare occurrence where i actually need to create an identifier then
> anything from a freebase style GUID (http://example.org/GUID-HERE)
> through to a generated meaningful URI http://mydomain.com/user/username
> /project/project-name or even just strapped to a class + microtime:
> $uri = 'http://mydomain.com/__CLASS__/' . microtime(true);
>
> There are milions of approaches; but it's worth noting that with each
> you can have extra functionality due to the identifier and locator
> duality of http scheme uri's (thanks to the domain name system).
>

Very interesting approach.  I'll have to think and research more into it.

> With regards what you are doing, if I may suggest a few things that you
> could try:
>
> You can create Identifiers that are spatial POINT()s and store them in
> mysql/postgres using either the MySQL spatial extension or PostGIS
> respectively. You can create identifiers using something like POINT(
> timestamp, float-id ) which again serves a duality of timestamping each
> record and identifying it. Moreover you'll be shocked at the speed gains
> from spatial indexing (seriously amazing), and further it allows you to
> do some pretty cool functionality with amazing speed.
>
> The spatial indexing lets you leverage your information in some pretty
> cool ways, at phenomenal speed. Because your data is essentially now
> points in a virtual world where X is time and Y is identity, you can
> pull information out by drawing MBRs around the data and thus selecting
> say all records between timestampA and timestampB with identities in the
> range 0-1832.1234 (we use floats rather than ints, far more scope and
> lends to great spatial optimisation / boxing). Further you aren't
> limited to basic geometries; you can create chains of data using
> linestring, test intersections on time, disjunctions and much more;
> again all with shocking speed over even the biggest of data sets (many
> billions in under 0.001s).
>
> You may also want to test out some non relational databases; as
> typically with large datasets you have to remove all the relational
> parts of the database (foreign keys etc) just to be able to run the
> thing efficiently. There are many kv db's; nosql solutions and my
> personal favourites which are quad/triple stores for EAV modeled data.
>
> Taking an datachanging approach and working with + storing all data as
> EAV triples is by far the fastest and most efficient way to make both
> small and large sites; everything is stored in a single flat "table" and
> you can query across all your data with great speed and chain queries
> together linking up id's to access your data in ways you can't even
> imagine ;) personally I'm running triple stores with 3-4 billion rows on
> many machines, even on my desktop!
>
> I'll leave it there, but something to get you started..
>
> Regards!
>

As for spatial data types, I've never find much use for non scientific
related.  (example) If using point as a PK, if MySQL stores it the
same way as PostgreSQL which is 16 bytes, how is that any different -
performance wise - than using UUID and storing it as binary(16) for
MySQL or uniqueidentifier (16 bytes) for PostgreSQL?

Thanks,
Tommy

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php