On Thu, May 29, 2014 at 12:58 PM, Sébastien Lorion <sl@xxxxxxxxxxxxxxxxxxxxx> wrote:
I have a master database sharded by user_id, with globally unique IDs for everything, except shared configuration data stored in global tables (resources strings, system parameters, etc).What would be the best (ie both fast and reliable, simple to maintain as a bonus) to merge all shards into a single read-only slave that will then be replicated and used for read queries ? I took a look at Londiste and repmgr, and can see some ways to accomplish that, but would appreciate the advice of people here.Thank you,Sébastien
Answering myself, please correct me if my findings are wrong.
I cannot find a way to accomplish the above without using statement level replication. That kind of defeat the point since if my DB is sharded, it's to avoid having to vertically scale to sustain the write charge, but by using statement level replication, I will now have to vertically scale the slave, bringing me back to square one.
So my conclusion is that for now, the best way to scale read-only queries for a sharded master is to implement map-reduce at the application level. Fortunately, most of the time, read queries scope can be limited to a single shard, but nonetheless, it would have been nice to avoid the additional complexity if it had been possible to merge sharded tables on a binary level (which should be much faster than statement level), given that their records will never overlap (i.e. the same record is never present in many shards).
Sébastien