Greg Stark wrote:
Phil Endecott wrote:
Just to give a bit of background, in case it is useful: this is my family tree
website, treefic.com. I have a schema for each user, each with about a dozen
tables. In most cases the tables are small, i.e. tens of entries, but the
users I care about are the ones with tens of thousands of people in their
trees. The schemas are independent of each other. Example web page:
I would strongly suggest you reconsider this design altogether. A normal (and
normalized) design would have a users table that assigns a sequential id to
each user. Then every other table would combine everybody's data but have a
user id column to indicate which user that row belonged to.
If you don't believe there's anything wrong with your current system, consider
what it would look like to query your existing schema to find out the answer
to the question "how many users have > 1000 people in their tree". Or "how
many users have updated their tree in the last 7 days".
Those aren't questions that I need to answer often. The sort of
question I do need to answer is this: starting from individual X, find
all the ancestors and descendants for n generations. This involves n
iterations of a loop, joining the relatives found so far with the next
generation. If there are p people in the tree this has something like
O(n log p) complexity. On the other hand, if I stored all users' data
in the same tables and I had u users, this operation would have O(n log
(u*p)) complexity. My guess is that it would be about an order of
magnitude slower.
The individual users' sites are entirely disjoint - there are no queries
that overlap them.
--Phil.
---------------------------(end of broadcast)---------------------------
TIP 9: In versions below 8.0, the planner will ignore your desire to
choose an index scan if your joining column's datatypes do not
match