On Sun, Nov 20, 2011 at 9:33 PM, Phoenix Kiula <phoenix.kiula@xxxxxxxxx> wrote: > Hi. > > Want to start another thread, loosely related to the performance > problems thread I have going. > > Need some DB design guidance from the gurus here. > > My big table now has about 70 million rows, with the following columns: > > > alias | character varying(35) > url | text > modify_date | timestamp without time zone > ip | bigint > > > For each IP address (user of my application) I want to have a unique > URL. So I used to have a UNIQUE constraint on IP, URL. But the index > based on this became huge, as some URLs are gigantic. so I introduced > an md5 of the URL: > > > url_md5 | varchar(32) > > > I now have two scenarios: > > 1. To have an index (unique?) on "(ip, url_md5)" > > 2. To not have an index on just the "ip". This way a query that tries > to match "...WHERE ip = 999 AND url_md5 = '<md5 here>'..." will > still look only at the ip bit of the index, then refine it with the > url_md5. > > The good thing about #2 is the size of index remains very small with > only a bigint field (ip) being indexed. > > The bad thing about #2 is that each query of "...WHERE ip = 999 AND > url_md5 = '<md5 here>'..." will have to refine the indexed IP. If one > IP address has put in a lot of URLs, then this becomes a bit slow. As > is now happening, where I have users who have over 1 million URLs > each! > > Questions: > > 1. Instead of md5, is there any integer hashing algorithm that will > allow me to have a bigint column and save a lot hopefully in both > storage space and speed? (Some very useful points mentioned here: > http://stackoverflow.com/questions/1422725/represent-md5-hash-as-an-integer > ) > > 2. If I do go with the above scenario #1 of a joint index, is there > any way I can save space and maintain speed? Partitioning etc are out > of the question. > > With a growing web database, I am sure many people face this > situation. Are nosql type databases the only sane solution to such > massive volumes and throughput expectations (e.g., CouchDb's MemBase)? > > Many thanks for any ideas or pointers! > I thought of adding a bigserial (serial8) column instead of varchar(32) for the md5. But postgresql tells me that: -- ERROR: type "bigserial" does not exist -- Why is this? Why can't I create a column with this "type"? Whats the current syntax? Thanks. -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general