On Tue, Aug 1, 2017 at 9:24 AM, Dmitry Lazurkin <dilaz03@xxxxxxxxx> wrote:
Ah. that would be another way to approach it. How many types have btree ordering functions without hashing functions, or the reverse?On 08/01/2017 07:13 PM, Jeff Janes wrote:
I think that HashSet is a Java-specific term. It is just a hash table in which there is no data to store, just the key itself (and probably a cash of the hashcode of that key), correct?
Yes. And in Java HashSet implemented on top of HashMap (:
I think a more general solution would be to get the planner and executor to run the in-list query using the Hash Join, the same way it runs the in-VALUES one.
Have additional plan nodes big overhead?
I don't think a new plan node will have meaningfully more overhead than new code of equal generality and robustness which has just been bolted on to the side of an existing node. I don't know how to test that, though. If the existing hash code is slow, it would probably be better to spend time improving it for everyone than coming up with yet another implementation. I know the existing simplehash.h code as specialized for tuples in src/backend/executor/execGrouping.c is horribly inefficient with memory usage when the tuples are skinny, but that shouldn't be a problem for the number of values likely to be present in a hard-coded in-list.
I was impressed at how well the JSON and hstore worked, you might want to look at how they do it. It is must be using an internal hash table of some sort.
JSONB and HSTORE keep sorted pairs and use binary search.
Cheers,
Jeff