Hi Tim, As arranged I am cc-ing the pgsql-general list in the hope they will assist. Your posts to the list may be delayed for moderation, I can't say. It could be helpful if you subscribed to the list, but it is relatively high traffic and I know you have extremely limited and expensive bandwidth. For this reason I'm not sure I can recommend subscribing. On Thu, 7 Apr 2016 12:22:26 +0200 Tim Vink <timvink@xxxxxxxxx> wrote: > Many thanks for your kind offer to give us some advice. I am Tim Vink, > Research Techician at the Kalahari Research Trust and currently > Database and Networks manager of the project. Our Project leaders, > Chris and Laura at the Meerkat project mainly involved with the main > Meerkat database that is currently in access, where Chris is helping > me develop the additional databases (that are currently loose db > and/or files (read up to 80000 csv or mapsource files) and make a > coherent structure for these. > > We are in the starting position and started off with MySQL using the > Percona Server variant for more advanced / easy replication and > databackups, would you recommend to move to PostGreSQL, what would be > our main advantages over MySQL? I no longer keep up with the MySQL feature set, and can't be considered an expert in the overall status of databases in the FOSS world. But I am not entirely disconnected either so should be of some help. I can't give you a point-by-point comparison of MySQL and Postgres. Overall, the difference is in approach. The goal of Postgres is to be SQL standards compliant, to be well designed, to be ACID compliant, and to be reliable. The attention to design is most significant. MySQL was written as glue, layering an SQL interface on top of a number of underlying database engines. And they were looking for speed. ACID compliance was something that came later. As a result, in MySQL there are myriad corner cases and rough edges. Postgres does not have these. This really starts to matter for people who don't spend all their time living cuddled up to their database and getting to know all it's quirks. Postgres takes the time to ensure each new feature is "clean". Although the emphasis is not on performance, this also includes performance. Again reliability is paramount. Postgres goes to great lengths to ensure that the data in the database is never corrupted. I already mentioned that it won't restore a database dump unless the resulting referential integrity is intact. A few minor examples: PG distinguishes between an empty string and NULL, the indeterminate data value. Most other dbs represent the empty string as a string containing a single space, and there is no distinction between an empty string and a single space. You can, with care, construct views in PG that act in every way like regular tables, they can be updated, inserted into, deleted from, etc. This is a great boon when users, such as in your case, work directly with the db. The PG SQL variant is a "clean", "typical" variant -- generally tracking Oracle. (The SQL standard is huge and awful and allows just about anything that any major vendor wanted to cram in.) There are no wierdo ways to write SQL that supply hints as to how to optimize the query, the PG query optimizer does the right thing for you. PG has nothing like the variation in SQL case sensitivity depending on underlying OS platform like you find in MySQL. In PG transactions apply everywhere, even to meta-data like table creation. This is unusual, and very useful when making test alterations to a test database as it eliminates error-prone cleanup on failure and consequential lack of synchronization between test and production databases. The PG "window functions" (see the PG docs) are also incredibly powerful for data analysis. As is the ability to embed R (r-project.org) into Postgres. (Although embedding R is dangerous from a security standpoint since it's then reasonably impossible to prevent a PG user from writing arbitrary files to disk. This breaks the barrier between db access and OS access.) You may also be interested in PostGIS for geospatial integration. (postgis.org) It is "way cool". Regards Percona Server, it's FAQ says it tracks Oracle's MySQL. I see the open source community moving away from Oracle's MySQL to MariaDB and this could be a long-term concern. > I take great example in your babase and ambaselli baboon project work. > Where we have started (still very very much under development) a > meerkat wiki. (meerkat.kalahariresearch.org) You may also want to look at gombemi.ccas.gwu.edu for some work I did for some of the Jane Goodall folk. It has some more advanced, from the standpoint of program internals, web-based tools. We use 2 idioms extensively. The first is batch upload from csv files. This makes converting data to electronic form, usually via MS Excel, a low-skill task. We upload with custom programs, and with generic uploads to tables and views. We have a generic upload program that ensures you get error messages for each line uploaded. We also have a wwwdiff program for paranoid data checking. (Have 2 people enter the data, independently, and then compare the result before upload to find typos.) Since our users interact with the database directly our data integrity is checked via an extensive set of triggers, ensuring that common data errors never get in the db. (Of course there can always be typos in entered data.) Our users use a data maintenance idiom for SQL manipulation that looks like: begin transaction, make changes, run some queries to test changes, rollback transaction. The SQL is submitted in a single "hunk" to the db. This is repeated until the data looks right and then the final rollback is changed to a commit. This idiom presents problems with some GUI front-ends. We had to modify phpPgAdmin to support it. You might want to look at adminer (www.adminer.org), which does support the idiom. (Coming from MS Access you may also find the adminer graphical GUI query builder attractive. As a rule I find these sorts of things more dangerous than not -- users tend to write queries they don't understand -- but there's surely good potential.) In general, there's no comprehensive web based solution for PG that "does everything". E.g. Most tools will want to stop after the first error on data upload. Some will refuse to upload into views. Etc. You can try using the Babase tools if you like. I have aspirations to build something better for generic data upload/download but they remain in the planning stage for the forseeable future. (See also: https://wiki.postgresql.org/wiki/Community_Guide_to_PostgreSQL_GUI_Tools ) You are of course welcome to our hacked phpPgAdmin. If you get a chance (and use PG) you should probably utilize the Gombe-MI/Babase warning system. (I should really package this separately, somehow.) It's a batch oriented data integrity checking tool based on user-supplied SQL queries that look for trouble. See the on-line docs. As far as a web stack goes I gravitate toward the following: Linux, Nginx, PostgreSQL, Python, Pyramid, SQLAlchemy, Mako, Bootstrap, Bootswatch. I used to work in PHP, but that was more because there used to be few alternatives regards scripting for the web. As an FYI, I find Debian to be more stable than Ubuntu. It also integrates very nicely with PG. The official PG repos are one of the very very few non-Debian repos which are recommended for inclusion in Debian. Since you're Internet-limited you may also find rsync interesting. (Hardlinked backups rock.) I have an rsync based backup script you are welcome to, but it serves my specific needs. It is often better to go with a more generic even if less serviceable tool like rsnapshot. > I would be happy to have a skype conversation or continue our > conversation over email / skype chat. FYI to the list, we will take our chat to #postgresql if and when we chat. Tim, if you do decide to try PG go to #postgresql for help with the initial configuration. You'll want to configure PG so that 40% of RAM (up to 4G?) is shared memory and used by PG. And adjust work_mem accordingly. (And if the box does nothing but PG it's sometimes useful to configure as much ram as possible as shared memory.) It can also be confusing to setup permissions for database access the first time you do it. (We give each user his own login, and his own schema for personal work. Some users get 2 logins, one for regular work and one "superuser" login for special purposes like creating new users. Each login is in one of 2 groups. There's a read-only group and a read-write group. These groups apply to the master, production, schema. The read-write group does not get to alter db structure, just alter table content. People can do whatever they want in their own schemas.) To end, since you're coming off of MS Access, you may not have a lot of experience in the FLOSS world. One key to success is picking the right components. Unlike in the proprietary world, and excepting distros, there are few one-stop shopping projects. Individual projects are typically tailored to doing one thing and doing it well. You add more projects into your installed mix to add additional functionality. Ask the community to find components. Regards, Karl <kop@xxxxxxxx> Free Software: "You don't pay back, you pay forward." -- Robert A. Heinlein -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general