Hello PostgreSQL fans, I would like to introduce myself and the TPC-V benchmark to the PostgreSQL community. I would then like to ask the community to help us make the TPC-V reference benchmarking kit a success, and establish PostgreSQL as a common DBMS used in measuring the performance of enterprise servers. I am VMware’s rep to the TPC, and chair the TPC’s virtualization benchmark development subcommittee. For those of you who don’t know the TPC, it is an industry standards consortium, and its benchmarks are the main performance tests for enterprise-class database servers. For external (marketing) use, these benchmarks are the gold standard of comparing different servers, processors, databases, etc. For internal use, they are typically the biggest hammers an organization can use for performance stress testing of their products. TPC benchmarks are one of the workloads (if not the main workload) that processor vendors use to design their products. So the benchmarks are in much heavier use internal to companies than there are official disclosures. TPC-V is a new benchmark under development for virtualized databases. A TPC-V configuration has: - multiple virtual machines running a mix of DSS, OLTP, and business logic apps - VMs running with throughputs ranging from 10% to 40% of the total system - load elasticity emulating cloud characteristic: The benchmark maintains a constant overall tpsV load level, but the proportion directed to each VM changes every 10 minutes A paper in the TPC Technical Conference track of VLDB 2010 described the initial motivation and architecture of TPC-V. A paper that has been accepted to the TPC TC track of VLDB 2012 describes in detail the current status of the benchmark. All TPC results up to now have been on commercial databases. The majority of active results are on Oracle or Microsoft SQL Server, followed by DB2, Sybase, and other players. Again, keep in mind that these benchmarks aren’t meant to only compare DBMS products. In fact the majority of results are “sponsored” by server hardware companies. The server hardware, processor, storage, OS, etc. all contribute to the performance. But you can’t have a database server benchmark results without a good DBMS! And that’s where PostgreSQL comes in. The TPC-V development subcommittee followed the usual path of TPC benchmarks by writing a functional specification, and looking to TPC members to develop benchmarking kits to implement the spec. TPC-V uses the schema and transactions of TPC-E, but the transaction mixes and the way the benchmark is run it totally new and virtualization-specific. We chose to start from TPC-E to accelerate the benchmark development phase: the specification would be easier to write, and DBMS vendors could create TPC-V kits starting from their existing TPC-E kits. Until now, benchmarking kits for various TPC benchmarks have been typically developed by DBMS vendors, and offered to their partners for internal testing or disclosures. So our expectation was that one or more DBMS companies that owned existing TPC-E benchmarking kits would allocate resources to modify their kits to execute the TPC-V transactions, and supply kits to subcommittee members for prototyping. This did not happen (let’s not get into the internal politics of the TPC!!), so the subcommittee moved forward with developing its own reference kit. The reference kit has been developed to run on PostgreSQL, and we are focusing our development efforts and testing on PostgreSQL. The reference kit will be a first for the TPC, which until now has only published paper functional specifications. This kit will be publically available to anyone who wants to run TPC-V, whether for internal testing, academic studies, or official publications. Commercial DBMS vendors are allowed to develop their own kits and publish with them. Even if commercial DBMS vendors decide later on to develop TPC-V kits, we expect official TPC-V publications with this reference kit using PostgreSQL, and of course a lot of academic use of the kit. I think this will be a boost for the PostgreSQL community (correct me if I am wrong!!). The most frequent question to the TPC is “do you offer a kit to run one of your benchmarks?”. There will finally be such a kit, and it will run on PGSQL. But TPC benchmarks is where the big boys play. If we want the reference kit to be credible, it has to have good performance. We don’t expect it to beat the commercial databases, but it has to be in the ballpark. We have started our work running the kit in a simple, single-VM, TPC-E type configuration since TPC-E is a known animal with official publications available. We have compared our performance to Microsoft SQL results published on a similar platform. After waving our hands through a number of small differences between the platforms, we have calculated a CPU cost of around 3.2ms/transaction for the published MS SQL results, versus a measurement of 8.6ms/transaction for PostgreSQL. (TPC benchmarks are typically pushed to full CPU utilization. One removes all bottlenecks in storage, networking, etc., to achieve the 100% CPU usage. So CPU cost/tran is the final decider of performance.) So we need to cut the CPU cost of transactions in half to make publications with PostgreSQL comparable to commercial databases. It is OK to be slower than MS SQL or Oracle. The benchmark running PostgreSQL can still be used to compare the performance of servers, processors, and especially, hypervisors under a demanding database workload. But the slower we are, the less credible we are.
|