Thomas Kellerer wrote: > Albe Laurenz schrieb am 17.12.2014 um 11:07: >> and the performance will be worse than reading files from the file system. > > There is a Microsoft research [1] (from 2006) which tested this "myth" using SQL Server. > It showed that the database might actually be faster than the file system. > > As this topic comes up at my workplace every now and then as well, I created a little web application > (Java/JDBC) to test this on Postgres and possibly other DBMS. > > Turns out the Postgres as well isn't really slower at this than the file system. > > For small files around 50k both perform similar: the average time to read the blob from a bytea column > was around 2ms whereas the average time to read the blob from the filesystem was around 1ms. The test > uses 50 threads to read the blobs using the PK of the table. > > "Reading from the filesystem" means looking up the path for the file in the database table and then > reading the file from the filesystem. > > For larger files around 250k Postgres was actually faster in my tests: 130ms reading the bytea column > vs. 260ms reading the file from disk. > > The tests were done locally on my Windows laptop. > I didn't have time yet to do this on a Linux server. I expect the filesystem to have some impact on > the figures and NTFS is not known for being blazingly fast. So maybe those figures will change. That must be some strangeness of the web application, que no? PostgreSQL must do everything that a direct file access does, right? Plus some extra processing (load the data into shared_buffers, ...). Given that, do you have any explanation for what you observed? > My tests however do not take into account the actual time it takes to send the binary data from the > server to the client (=browser). It might well be possible that serving the file through an Apache Web > Server directly is faster than serving the file through a JEE Servlet. My intention was to measure the > raw read speed of the binary data from the medium where it is stored. Why not compare 'SELECT ...' with psql (or libpq) with 'cat ...' to measure the difference? >> The downside is that you might end up with a huge database >> that you will have to backup and maintain > > I don't really buy the argument with the backup: the amount of data to be backed up is essentially the > same. > With both solutions you can have incremental backups. The amount to back up will stay roughly the same, granted. But isn't backup/restore of a large database more cumbersome than backup/restore of a file system? And a major upgrade of a large database is more painful, right? > Another downside you didn't mentioned is the fact that you have to distribute the files in the > filesystem properly. > Having thousands or even millions of files in a single directory is not going to be maintenance > friendly either. That's right, you have to spend some thought on how to store the files as well. Yours, Laurenz Albe -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general