Responses in-line: On Tue, 10 Feb 2015 19:52:41 -0500 Mathieu Basille <basille.web@xxxxxxxxxxxxxxxx> wrote: > > I am posting here a question that I initially asked on the PostGIS list > [1], where I was advised to try here too (I will keep both lists updated > about the developments on this issue). > > I am currently planning to set up a PostgreSQL + PostGIS instance for my > lab. Turns out I believe this would be useful for the whole center, so that > I'm now considering setting up the server for everyone?if interest is > shared of course. At the moment, I am however struggling with what would be > required in terms of hardware, and of course, the cost will depend on > that?at the end of the day, it's really a matter of money well spent. I > have then a series of questions/remarks, and I would welcome any feedback > from people with existing experience on setting up a multi-user PostGIS > server. I'm insisting on the PostGIS aspect, since the most heavy requests > will be GIS requests (intersections, spatial queries, etc.). However, > people with similar PostgreSQL setup may have very relevant comments about > their own configuration. > > * My own experience about servers is rather limited: I used PostGIS quite a > bit, but only on a desktop, with only 2 users. The desktop was quite good > (quad-core Xeon, 12 Go RAM, 500 GB hd), running Debian, and we never had > any performance issue (although some queries were rather long, but still > acceptable). > > * The use case I'm envisioning would be (at least in the foreseeable future): > - About 10 faculty users (which means potentially a little bit more > students using it); I would have hard time considering more than 4 > concurrent users; > - Data would primarily involve a lot (hundreds/thousands) of high > resolution (spatial and temporal) raster and vector maps, possibly over > large areas (Florida / USA / continental), as well as potentially millions > of GPS records (animals individually monitored); > - Queries will primarily involve retrieving points/maps over given > areas/time, as well as intersecting points over environmental layers [from > what I understand, a lot of I/O, with many intermediary tables involved]; > other use cases will involve working with steps, i.e. the straight line > segment connecting two successive locations, and intersecting them with > environmental layers; > > * I couldn't find comprehensive or detailed guidelines on-line about > hardware, but from what I could see, it seems that memory wouldn't be the > main issue, but the number of cores would be (one core per database > connection if I'm not mistaken). At the same time, we want to make sure > that the experience is smooth for everyone... I was advised on the PostGIS > list to give a look at pgpool (however, UNIX only). # of cores helps in parallel processing. But 4 simultaneous users doesn't particularly mean 4 simultaneous queries. How much time do your users spend running queries vs. idling? If you don't expect more than 4 concurrent users, I would think you'll be fine with a single quad-core CPU. I would get the fastest CPU available, though, as it will make number crunching go faster. I can't see any reason why you'd want/need pgpool. pgpool is generally useful when you have a LOT of simultaneous connections, and you're only estimating 4. Additionally, pgpool is fairly easy to add on later if you need it ... so my recommendation would be not to worry about it just yet. > * Does anyone have worked with a server running the DB engine, while the DB > itself was stored on another box/server? That would likely be the case here > since we already have a dedicated box for file storage. Along these lines, > does the system of the file storage box matter (Linux vs. MS)? Yes. If you have a lot data that will need to be crunched, I would consider getting SSDs directly attached to the computer running Postgres. Anything you put between RAM and your disks that slows down transfers is going to hurt performance. However, since you haven't made an estimate of the physical size of the data, I can't comment on whether sufficient SSD storage is cost effective or not. If you can't get DAS storage, you can make up for some of the performance hit by getting lots of RAM. Part of the effectiveness of the RAM is dependent on the OS and it's storage drivers, though, and I have no experience with how well Windows does that ... and since you didn't mention which file storage technology you're using, I can't comment on that either. SAN and NAS storage vary wildly from brand to brand on their performance characteristics, so it's difficult to say unless you can find someone who has tried the exact hardware you're liable to be using. If performance is important, I highly recommend DAS, and furthermore SSDs if you can afford them. > * We may also use the server as a workstation to streamline PostGIS > processing with further R analyses/modeling (or even use R from within the > database using PL/R). Again, does anyone have experience doing it? Is a > single workstation the recommended way to work with such workflow? Or would > it be better (but more costly) to have one server dedicated to PostGIS and > another one, with different specs, dedicated to analyses (R)? I know nothing about R. But the question isn't really dependent on R. Whether it works will depend on how memory and CPU intensive the code you're running in R is, and whether that's enough CPU/memory usage to interfere with what Postgres needs to do its portion of the work. Usually, you'll get better performance by running your non-Postgres processes on another machine, thus increasing the total # of cores and amount of RAM available to the process, but sometimes, when the transfer of data from the database to the other code is the bottleneck, the opposite is true. Sorry that I'm saying "it depends" so many times, but hopefully the details on how it depends will help you make decisions, or at least tell you what to investigate to decide. -- Bill Moran -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general