Karsten Hilbert wrote: > On Mon, Oct 05, 2015 at 03:27:26PM +0000, Kevin Grittner wrote: > > > Karsten Hilbert <Karsten.Hilbert@xxxxxxx> wrote: > > > > > I am dealing with radiology studies aka DICOM data) one would > > > want an md5 function which streams in parts of a large object > > > piece by piece using md5_update and m5_finalize or some such. > > It would certainly be possible to write a lo_md5(oid) function to do > > this, but as far as I'm aware nobody has yet done so. How are your > > C skills? > > Next to none, unfortunately :-/ It would be nice to have this API exposed in pgcrypto, but as a workaround, you could use plperl instead, or a similarly "easy" PL that provides it. Here's an example with plperlu, using the core Digest module. It works with other hash types, not just MD5. See http://perldoc.perl.org/Digest.html for the exact names (warning: 'SHA-1' instead of pgcrypto's 'SHA1'). CREATE OR REPLACE FUNCTION lo_digest(text, oid, int) RETURNS bytea AS $$ use Digest; use strict; use Data::Dumper; my $ctxt = Digest->new($_[0]); my $sz=$_[2]; elog(ERROR, "Invalid chunk size: $sz") if ($sz<=0); my $sth = spi_query("SELECT lo_open($_[1], 262144) as fd"); my $row = spi_fetchrow($sth); spi_cursor_close($sth); if ($row) { my $fd = $row->{fd}; my $bytes; my $plan = spi_prepare("SELECT loread($fd, $sz) as chunk"); do { $sth = spi_query_prepared($plan); $row = spi_fetchrow($sth); $bytes = decode_bytea($row->{chunk}); $ctxt->add($bytes); spi_cursor_close($sth); } while (length($bytes)>0); spi_exec_query("select lo_close($fd)"); spi_freeplan($plan); } return encode_bytea($ctxt->digest); $$ LANGUAGE plperlu; Example of use: # select lo_digest('MD5', 2557608, 2048*256); lo_digest ------------------------------------ \xa8447e145d0f8d9ca7fe7df1bbf06d75 2557608 is the oid and 2048*256 represents 512KB. The lo blocksize is typically 2048, so a multiple of 2048 is ideal to optimize reads. I expect it to work on very large contents while requesting only modest amounts of memory. But unfortunately is seems pretty slow. On my desktop core i5-3470, it takes 2.5 seconds to digest a 100 MB audio file already in cache. An independant Perl script doing the equiavlent processing on the same data takes 0.4s . It's not because of Digest::MD5 or loread(), it seems that it's the pipe in-between with the text->bytea decoding that eats most of the CPU cycles, that step being necessary because plperl lacks the ability to consume bytea directly. Sigh... Best regards, -- Daniel Vérité PostgreSQL-powered mailer: http://www.manitou-mail.org Twitter: @DanielVerite -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general