On 2021-12-08 14:44:47 -0500, David Gauthier wrote: > So far, the tables I have in my DB have relatively low numbers of records (most > are < 10K, all are < 10M). Things have been running great in terms of > performance. But a project is being brainstormed which may require some tables > to contain a couple billion records. [...] > What else should I be worried about ? > > I suspect that part of why things are running really well so far is that the > relatively small amounts of data in these tables ends up in the DB cache and > disk I/O is kept at a minimum. Will that no longer be the case once queries > start running on these big tables ? Depends a lot on how good the locality of your queries is. If most read only the same parts of the same indexes, those will still be in the cache. If they are all over the place or if you have queries which need to read large parts of your tables, cache misses will make your performance a lot less predictable, yes. That stuff is also hard to test, because when you are testing a query twice in a row, the second time it will likely hit the cache and be quite fast. But in my experience the biggest problem with large tables are unstable execution plans - for most of the parameters the optimizer will choose to use an index, but for some it will erroneously think that a full table scan is faster. That can lead to a situation where a query normally takes less than a second, but sometimes (seemingly at random) it takes several minutes - users will understandably be upset about such behaviour. It is in any case a good idea to monitor execution times to find such problems (ideally before users complain), but each needs to be treated on an individual basis, and sometimes there seems to be no good solution. hp -- _ | Peter J. Holzer | Story must make more sense than reality. |_|_) | | | | | hjp@xxxxxx | -- Charles Stross, "Creative writing __/ | http://www.hjp.at/ | challenge!"
Attachment:
signature.asc
Description: PGP signature