On Tue, May 24, 2016 at 9:43 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Yeah. I wonder what would happen if we used the same rule for index > insertions. It would likely make insertions more expensive, but maybe > not by much. The existing "randomization" rule for where to insert new > items in a run of identical index entries would go away, because the > insertion point would become deterministic. I am not sure if that's > good or bad for insertion performance, but it would likely help for > scan performance. I think that if somebody tacked on a tie-breaker in the same way as in tuplesort.c's B-Tree IndexTuple, there'd be significant negative consequences. The equal-to-high-key case gets a choice of which page to put the new IndexTuple on, and I imagine that that's quite useful when it comes up. I'd also have concerns about the key space in the index. I think that it would seriously mess with the long term utility of values in internal pages, which currently can reasonably have little to do with the values currently stored in leaf pages. They're functionally only separators of the key space that guide index scans, so it doesn't matter if the actual values are completely absent from the leaf pages/the table itself (perhaps some of the values in the internal pages were inserted years ago, and have long since been deleted and vacuumed away). Provided the distribution of values at the leaf level is still well characterized at higher levels (e.g. many string values that start with vowels, very few that start with the letters 'x' or 'z'), there should be no significant bloat. That's very valuable. Unique indexes are another problem for this naive approach. Maybe somebody could do better with a more sophisticated approach, but it's probably better to focus on duplicate storage or even leaf page compression, as Stephen mentioned. -- Peter Geoghegan -- Sent via pgsql-performance mailing list (pgsql-performance@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance