Re: Postgres as key/value store

Jonathan Vanasco <postgres@xxxxxxxx> · Mon, 29 Sep 2014 19:20:08 -0400

On Sep 27, 2014, at 7:48 PM, snacktime wrote:

> The schema is that a key is a string, and the value is a string or binary.  I am actually storing protocol buffer messages, but the library gives me the ability to serialize to native protobuf or to json.  Json is useful at times especially for debugging.

I don't know if this will apply to you, but i received significant speed improvements on Postgres key searches by using substring indexes.

If your keys are just random hashes, this would probably work well for you.  
if your keys are person-readable, it's probably not going to work as well as the distribution of prefix characters will probably be too uniform.

But the general idea is twofold:

	1. create an additional partial index on the key field -- CREATE INDEX _entities_id__subst_7 ON entities(substr(id,1,7));
	2. update your SELECTS to search for both the full string AND the substring

		- WHERE id = :id
		+ WHERE (id = :id) AND (substr(id,1,7) = substr(:id, 1, 7))

By adding in the substring query, the planner will (usually) optimize the select by doing a first pass on the substring index.  then it searches that limited set for the rest of matching criteria.

on a table with 4MM+ records , introducing a substring index/query improved my searches by a few orders of magnitude.    

before trying this indexing strategy, we were actively looking to migrate this particular query service off of postgres -- it was such a bottleneck and was not scalable.  
now there is no reason to leave in the foreseeable future.

On Sep 27, 2014, at 8:33 PM, Gavin Flower wrote:

>>  This works well because keys are left prefixed with a scope, a delimiter, and then the actual key for the data.  
> Then I noticed that your id is actually a compound key, and probably would be better modelled as:

if you're able to standardize the scope out, an index of "(scope, substring(key,1,7))" might work well.

i only used 1,7 as my key arguments, because that was an optimal speed/space mix on my dataset.  depending on yours, a shorter or longer index might be more appropriate

-- 
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general