Thanks to everyone who contributed to this thread, either on the PostGIS
[1] or the PostgreSQL [2] mailing lists. I will try to summarize everything
in this message, which I will actually post on both lists to give an update
to everyone. I hope it can be useful for other people interested. Please
feel free to add more advice and other experiences, this is always useful!
Performance
===========
* CPU
Good CPU required for faster processing. Number of cores helps in parallel
processing, but number of users != number of queries (Example: with no more
than 4 concurrent users, it should be fine with a single quad-core CPU).
* Memory
Examples go from 8 to >32 GB RAM.
* Disks
Lots of I/0 with geoprocessing requires fast disks: best with SSD,
otherwise 10k/15k RPM. An alternative would be to store indexes on faster
disks and data on slower disks (need to tune PostgreSQL).
Better to have direct-attached storage (DAS), i.e. on the server directly
(direct transfer between RAM and disks); external storage requires good
network (additional RAM increases performance).
* Massive multi-user environment (lot of simultaneous connections): pgpool
[3] (Linux/UNIX only). pgpool can be added later on, no need to worry about
it as a start.
Platform
========
Linux is the platform of choice:
* Easier administration (install/configuration/upgrade), which is also true
for addons/dependencies (starting with PostGIS, but also GEOS, GDAL, PL/R);
* Better performance [4];
* More tuning options (limited with MS systems);
There is still the possibility of a virtualbox on a MS server.
Other considerations
====================
* Backup: integrate a script that runs daily pg_dump daily to export and
upload DB to storage box (which is part of the backup system)
* Integration with R: a dedicated R server brings more flexibility /
extensions (e.g. Shiny) / performance (more cores and memory available for
PostGIS) except if data transfer is the bottleneck. Use Pl/R for small
functions (also if it fits naturally into PostgreSQL workflow) / otherwise
in R with PostgreSQL connector.
Example setups
==============
* Dell Precision 2×6 cores, 20 GB RAM, SSD for indexes, 7200 HDD for big
tables [Rémi Cura]:
Various usages, from visualization (few users) to complex queries with a
lot of reading/writing (several users).
* Bare metal machine with ESXI; PostgreSQL machine with 8Gb RAM; 2
quad-core processors; PostgreSQL tuned for fast reads, with large cache;
pgPool; disks: 2 7.200RPM disks - with RAID 1 [George Silva]:
12 concurrent QGIS users, editing around 50.000 km² of land use coverage in
1:5000 scale with lot of detail (in two separate DB). More editing than
processing; some heavy queries (e.g. "complete feature" tool from QGIS) can
take some time.
* Nominatim (OpenStreetMap data) [5]: > 1 GB RAM necessary, >32 GB
recommended; 700 GB HDD; SSD recommended; example machine: 12-core with
32GB RAM and standard SATA disks, I/O limiting factor.
Thanks again for the good feedback! This gives me very useful information
to get started (I think this is still going to be a long process).
Mathieu Basille.
[1] http://lists.osgeo.org/pipermail/postgis-users/2015-February/040120.html
[2] http://www.postgresql.org/message-id/54DAA7D9.8020908@xxxxxxxxxxxxxxxx
[3] http://www.pgpool.net/
[4]
https://stackoverflow.com/questions/8368924/postgresql-performance-on-windows-using-latest-versions
[5] http://wiki.openstreetmap.org/wiki/Nominatim/Installation
Le 10/02/2015 19:52, Mathieu Basille a écrit :
Dear PostgreSQL users,
I am posting here a question that I initially asked on the PostGIS list
[1], where I was advised to try here too (I will keep both lists updated
about the developments on this issue).
I am currently planning to set up a PostgreSQL + PostGIS instance for my
lab. Turns out I believe this would be useful for the whole center, so that
I'm now considering setting up the server for everyone—if interest is
shared of course. At the moment, I am however struggling with what would be
required in terms of hardware, and of course, the cost will depend on
that—at the end of the day, it's really a matter of money well spent. I
have then a series of questions/remarks, and I would welcome any feedback
from people with existing experience on setting up a multi-user PostGIS
server. I'm insisting on the PostGIS aspect, since the most heavy requests
will be GIS requests (intersections, spatial queries, etc.). However,
people with similar PostgreSQL setup may have very relevant comments about
their own configuration.
* My own experience about servers is rather limited: I used PostGIS quite a
bit, but only on a desktop, with only 2 users. The desktop was quite good
(quad-core Xeon, 12 Go RAM, 500 GB hd), running Debian, and we never had
any performance issue (although some queries were rather long, but still
acceptable).
* The use case I'm envisioning would be (at least in the foreseeable future):
- About 10 faculty users (which means potentially a little bit more
students using it); I would have hard time considering more than 4
concurrent users;
- Data would primarily involve a lot (hundreds/thousands) of high
resolution (spatial and temporal) raster and vector maps, possibly over
large areas (Florida / USA / continental), as well as potentially millions
of GPS records (animals individually monitored);
- Queries will primarily involve retrieving points/maps over given
areas/time, as well as intersecting points over environmental layers [from
what I understand, a lot of I/O, with many intermediary tables involved];
other use cases will involve working with steps, i.e. the straight line
segment connecting two successive locations, and intersecting them with
environmental layers;
* I couldn't find comprehensive or detailed guidelines on-line about
hardware, but from what I could see, it seems that memory wouldn't be the
main issue, but the number of cores would be (one core per database
connection if I'm not mistaken). At the same time, we want to make sure
that the experience is smooth for everyone... I was advised on the PostGIS
list to give a look at pgpool (however, UNIX only).
* Is there a difference in terms of possibilities, performance and
usability between a Linux-based and a MS-based server (from the user
perspective)? My center is unfortunately MS-centered, and existing
equipment runs with MS systems... It would thus be easier for them to set
up a MS-based server. Does it change anything for the user? (I insist on
the user perspective, since I and others will not admin the system, but
only use it)
* Does anyone have worked with a server running the DB engine, while the DB
itself was stored on another box/server? That would likely be the case here
since we already have a dedicated box for file storage. Along these lines,
does the system of the file storage box matter (Linux vs. MS)?
* We may also use the server as a workstation to streamline PostGIS
processing with further R analyses/modeling (or even use R from within the
database using PL/R). Again, does anyone have experience doing it? Is a
single workstation the recommended way to work with such workflow? Or would
it be better (but more costly) to have one server dedicated to PostGIS and
another one, with different specs, dedicated to analyses (R)?
I realize my questions and comments may be a confusing, likely because of
the lack of experience about these issues on my side. I really welcome any
feedback of people working with PostgreSQL servers (+ PostGIS ideally!) in
a small unit, or any similar setting that could be informative!
In advance, thank you very much!
Sincerely,
Mathieu Basille.
[1] Start of the thread here:
http://lists.osgeo.org/pipermail/postgis-users/2015-February/040120.html
--
~$ whoami
Mathieu Basille
http://ase-research.org/basille
~$ locate --details
University of Florida \\
Fort Lauderdale Research and Education Center
(+1) 954-577-6314
~$ fortune
« Le tout est de tout dire, et je manque de mots
Et je manque de temps, et je manque d'audace. »
-- Paul Éluard
--
Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general