Thanks for the reply's. I was tempted to accept the Rodoslaw Smogura proposal. There will be about 100 websites to capture data on daily basis. Each website adds per day(average) 2 articles.
Thomas talked about the noSQL possibility. What do you think would be better? I have no experience in noSQL and that could be a weakness.
Best Regards,
AndrÃ
On Mon, Jan 3, 2011 at 11:58 AM, Thomas Schmidt <postgres@xxxxxxxxxxxxxxxxxxxx> wrote:
ÂHello,
Am 03.01.11 12:46, schrieb RadosÅaw Smogura:(...)
I can propose you something like this:
website(id int, url varchar);
attr_def (id int, name varchar);
attr_val (id int, def_id reference attr_def.id, website_id int references website.id, value varchar);
If all of your attributes in website are single valued then you can remove id from attr_val and use PK from website_id, def_id.
Depending on your needs one or many from following indexes:
attr_val(value) - search for attributes with value;
Imho its hard - (if not impossible) to recommand a specific database scheme (incl indexes) without knowing the applications taking plance behind it.
Probably you will use 2nd or 3rd index.
Example of search on website
select d.name, v.value from attre_def d join attr_val v on (v.def_id = d.id) join website w on (v.website_id = w.id)
where d.name = 'xxxx' and w.url="" href="http://somtehing" target="_blank">http://somtehing'
Your schema is nice for specific querying, but might blow up if lots of data is stored in the database (joins, index-building might be time consuming).
On the other hand, google put some effort into their "BigTable" Âhttp://en.wikipedia.org/wiki/BigTable for storing tons of data...
Thus - it all depends on the usage :-)
Thomas
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general