On Thu, 10 Jul 2014 08:40:59 -0700 (PDT) AlexK <alkuzo@xxxxxxxxx> wrote: > Bill, > > Regarding "SELECT performance improve nearly linerally to the number of > partitions," - can you elaborate why? If I split my table into several > partitions, even the index depth may stay the same, because the PK is > narrow, it only consists of 2 4-byte integers. That statement is based on experimentation. About a year ago, I did a research project for a former employer to determine the best way to store a large amount of data. Like you, we were talking about narrow rows, but we were looking at the total # of rows exceeding 10 billion. Also different, we didn't expect the number of unique "ParentID"s to ever exceed 100,000. I managed to borrow some beefy hardware from another project that wouldn't need it for a few weeks and do some experimenting with different partition configurations, all compared to a baseline of an unpartitioned table. I don't remember exactly, but I believe I was populating the test databases with 1 billion rows. That's where I came up with the "linear" determination. A query that averaged 100ms on a single table averaged 10ms on 10 partitions and 1ms on 100 partitions (etc). Every test I concocted seemed to support that the improvement was linear. Another point that I expermimented with, and is worth noting: not all queries are able to benefit from the partitioning, but the test queries that I ran that could not, saw only a few percent of performance hit. Since our app design would take advantage of the partitioning for 99% of its queries, it was a no-brainer. As a result, I can only theorize, but my theory is that since each partition is actually a table, and has indexes independently of the other table/partitions, that both indexes searches and table scans required less work with smaller tables. That theory doesn't 100% make sense, since index performance should not degrade linerally with index size, but that was the behavior I observed. It's likely that because of the significant size of the hardware, that the performance curve was simply so close to flat that it looked linear at the sizes I was working with. > At this time we would rather not introduce bugs related to cache > invalidation. Although we do need to read data fast, we do not want stale > data, and we cannot afford any bugs in this application. I can't make your decisions for you. But I'll make an attempt here to try to help you avoid the same mistake I made. On that same project, we made a similar decision: caches are sources of bugs so we'll just make sure the DB is tuned well enough that we don't need a cache. That decision led to a lot of late night work under duress to get caching reliably implemented at the last minute. There are a lot of things that partitioning doesn't speed up: parsing and planning the queries, the time it takes the network to move data back and forth, and the time it takes the application to demarshall the results provided by the SQL server. Caching data in RAM completely removes all of that overhead. Of course, only you and your developers can make the determination as to whether you need it. I'm just recommending that you don't fail to consider it simply because it's difficult to do reliably. If you're use the "can't risk bugs" argument, you might benefit more by improving your QA process than anything else. -- Bill Moran <wmoran@xxxxxxxxxxxxxxxxx>