Hi,
I'm kind of struggling with storing binary files inside the database.
I'd like to store them in BYTEA columns, but the problem is the files
are quite large (a few megabytes, usually), so the PHP memory_limit is
reached when escaping the data.
It does not matter whether I use old-fashioned pg_* functions or the new
PDO extension - with BYTEA columns both do behave the same. For example
with a 16MB file (called input.data) and 8MB memory_limit, this throws
an 'Allowed memory exhausted' exception due to the fact that all the
data (read from the file) have to be escaped at once:
------------------------------------------------------------------------
$pdo = new PDO('pgsql: ... ');
// data_table (id INTEGER, data BYTEA)
$stmt = $pdo->prepare('INSERT INTO data_table VALUES (:id, :data)');
$id = 1;
$input = fopen('/tmp/input.data', 'rb');
$stmt->bindParam(':id', $id, PDO::PARAM_INT);
$stmt->bindValue(':data', $input, PDO::PARAM_LOB);
$stmt->execute();
------------------------------------------------------------------------
Till now I've found two solutions, but none of them meets all my
requirements:
1) using LOBs - with LOBs a 'streaming of data' is possible, i.e. you
can do this:
---------------------------------------------------------------------
$pdo->beginTransaction();
$input = fopen('/tmp/input.data', 'rb');
$oid = $pdo->pgsqlLOBCreate();
$stream = $pdo->pgsqlLOBOpen($oid, 'w');
stream_copy_to_stream($input, $stream);
$pdo->commit();
---------------------------------------------------------------------
This is nice because it saves memory, but a serious disadvantage (for
me) is a lack of refferential integrity - you can delete a LOB even
if it's referenced from some table (i.e. an OID is stored in it). So
it's basically the same as storing the data directly in a filesystem,
and storing just a path to it.
2) using BYTEA columns and do the 'streaming' on my own - just store the
data as usual, but upload them 'by pieces' (say 100kB). This is
achieved by an initial INSERT (with say 100kB of data), followed by a
number of updates with 'data = data || '... new data ...' appending
the data. When reading the data, you have to do the same - read them
piece by piece.
Yes, it's kind of dirty (especially for large files - this may cause
a lot of queries), but preserves all the nice BYTEA colunm features
(no dangling / missing LOBs, etc.).
Is there any other way to solve storing of large files in PostgreSQL?
These are the most important requirements of the solution:
- I do want to store the files inside the database, and I don't want to
store just the paths. It's quite difficult to combine transactional
(database) and non-transactional (filesystem) resources properly.
- I want to preserve as much 'nice' PostgreSQL features as possible (for
example referential integrity is a nice feature).
- Storing / retrieving of the files has to work with a quite small
memory_limit PHP option (say 8MB). I can't (and don't want to) modify
this option (it's a shared server).
- Optimization is a serious criterion, as is reliability.
Thanks for all your recommendations.
regards
Tomas