On Thu, Feb 28, 2019 at 6:24 AM Chris Travers <chris.travers@xxxxxxxxx> wrote: > > On Thu, Feb 28, 2019 at 1:09 PM Pavel Stehule <pavel.stehule@xxxxxxxxx> wrote: >> >> Hi >> >> čt 28. 2. 2019 v 12:47 odesílatel Thomas Güttler <guettliml@xxxxxxxxxxxxxxxxxx> napsal: >>> >>> Hi experts, >>> >>> where would you suggest someone to **not** use PostgreSQL? > > > Hard question. There are a lot of general places where PostgreSQL is not by itself the best fit, and where a careful weighing of pros and cons would need to be made before deciding to use it. > > Having used PostgreSQL in place of ElasticSearch for PB-scale deployments, as a high-throughput queue system, and near-real-time OLAP workloads, I am fairly aware of how hard it can be pushed. > > So the answers here are not "don't use PostgreSQL here" but "think about it first and consider alternatives." > > >>> >>> >>> Why would you do this? > > > Replacing with "What would you consider to be the tradeoffs?" >>> >>> >>> What alternative would you suggest instead? > > > So a few possibilities: > > 1. a) TB-scale full text search systems. > b) PostgreSQL's full text search is quite capable but not so powerful that it can completely replace Lucene-based systems. So you have to consider complexity vs functionality if you are tying with other data that is already in PostgreSQL. Note further that my experience with at least ElasticSearch is that it is easier to scale something built on multiple PostgreSQL instances into the PB range than it is to scale ElasticSearch into the PB range. > c) Solr or ElasticSearch In my company we had to swap out solr for postgres. The main challenge was that solr's limited query language was not able to deal with complex authorization use cases that were coming in after the original project was deployed. Our only solution was to heavily denormalize the documents so that when simple changes happened on the OLTP side we had to push large amounts of data into SOLR. In addition to being slow, solr (or I guess the underlying lucene) started to develop weird failure modes; there were unpredictable swings in memory and/or disk usage, underlying system resource exhaustion (especially fd). The whole thing felt unstable; we had tested heavily with the original requirements but the tech did not evolve with the project. The solution was to junk the whole thing and replace it with an API compatible version of solr in the database. To index the document we use a special search string with upper case keys and lower case values in a tab delimited text string; pg_trgm/gin does the rest of the lifting. It can't compete with solr on best case behavior but give much better worst case behavior, and, since we don't have to denormalize, the system fits within memory making scaling a snap. The moral of the story here is 'Just use postgres'. This is not zealotry; if I were a microsoft inclined person, I might be advising use of sql server. If you are not totally and completely aware of the limits of the system you are probably operating within them. The database is evolving rapidly and insanely powerful servers, supercomputers even, from the perspective of even 10 years ago, can be had for a mouse click on the cheap. There of course a few use cases were postgres is not optimal tech; highly unstructured data...super high transaction rate master master loss tolerant data archiving, warm log storage, etc. These problems show up quite rarely in the world of data which is generally directed towards systems support of business applications. If you haven't mastered the database first, you probably shouldn't be building out complex systems in non-database technology since you don't know what you don't know (for example, the immense value that transactions bring to the table). merlin