Hi,
I am interested in trying the hash partitioning method now available in 11 (trying the beta 4). However, I have the data already hashed at the application level across multiple postgres instances. If possible, I would like to keep these two hashing methods same. This would enable me to move a single partition (considering we have attach/detach methods available) from one instance to another and have queries work seamlessly.
The application can control what data/query is sent to each instance - the only thing making this setup impossible is (a) the built-in HASH function not available/accessible to me outside of pg context, say, as a C library. and (b) I don't know how to sub a known hash function (say, murmur, xxhash, siphash) instead of the builtin hash function. I am not particularly insisting on any particular hash function, except for it to available outside of postgres (say as a C or Go library).
Based on a discussion in the IRC channel, I was told I could try creating a custom hash function (postgres extension) and use it in RANGE or LIST partitioning with that _expression_.
I have the following code installed as a postgres extension http://dpaste.com/2594KWM, takes an implementation of xxhash.c and sticks it in as a postgres extension †
~/tmp/pge$ gcc -fPIC -I/usr/include/postgresql/11/server -c pge.c && gcc -shared -o pge.so pge.o
Problem is that with this setup, partitioning of the writes/inserts work fine, but I don't see pg excluding the unnecessary partitions on read/queries
This setup based on _expression_ also has other issues - I can't mark k as a primary key or have a unique key on that column. If it is declared as a hash partition, I can have those.
This question may be related to this thread as well
† surprisingly, this function works even if I don't compile in xxhash.c/o into the .so - that is yet another side mystery to figure out. It is like the symbol XXH64 is already available dynamically. I did have plpython installed at some point. but this is a separate issue.
--
Harry