Search Postgresql Archives

Re: How to get an md5/sha256 hash of a really large object in psql?

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



> On 29/07/2023 08:42 CEST Alex Shan <3341018@xxxxxxxxx> wrote:
>
> In my DB I have a large object over 4GB in size.
> I need to get its MD5 or SHA256 from within psql query, i.e. without
> exporting it to FS first.
>
> “SELECT md5(lo_get(loid));” doesnt work — “large object is too large”.
>
> Is there any other way to do it?

Is plpython3u [1] an option for you?  In that case you can use Python's hashlib
in a custom function and feed every page from pg_largeobject to a selected hash
function.

For example:

	BEGIN;

	CREATE EXTENSION plpython3u;

	CREATE OR REPLACE FUNCTION lo_hash(loid oid, name text)
	    RETURNS bytea
	    LANGUAGE plpython3u
	AS $$
	    import hashlib
	
	    hash = hashlib.new(name)
	
	    # Check if large object exists.
	    plan = plpy.prepare("""
	        SELECT
	        FROM pg_largeobject_metadata
	        WHERE oid = $1
	    """, ['oid'])
	    rv = plpy.execute(plan, [loid])
	
	    if rv.nrows() == 0:
	        raise ValueError(f"large object {loid} does not exist")
	
	    # Get all pages (possibly zero).
	    plan = plpy.prepare("""
	        SELECT data
	        FROM pg_largeobject
	        WHERE loid = $1
	        ORDER BY pageno
	    """, ['oid'])
	    pages = plpy.cursor(plan, [loid])
	
	    for page in pages:
	        hash.update(page['data'])
	
	    return hash.digest()
	$$;

	COMMIT;

Testing with 65 KiB null bytes:

	BEGIN;

	SELECT
	  lo_from_bytea(0, decode(repeat('00', 1 << 16), 'hex')) AS test_loid
	\gset
	
	SELECT
	  loid,
	  count(*) AS n_pages,
	  sum(length(data)) AS n_bytes
	FROM pg_largeobject
	WHERE loid = :test_loid
	GROUP BY loid;

	  loid  | n_pages | n_bytes
	--------+---------+---------
	 365958 |      32 |   65536
	(1 row)

	SELECT
	  :test_loid AS loid,
	  lo_hash(:test_loid, 'md5') AS md5;

	  loid  |                md5
	--------+------------------------------------
	 365958 | \xfcd6bcb56c1689fcef28b57c22475bad
	(1 row)

	SELECT
	  :test_loid AS loid,
	  lo_hash(:test_loid, 'sha256') AS sha256;

	  loid  |                               sha256
	--------+--------------------------------------------------------------------
	 365958 | \xde2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31
	(1 row)

	ROLLBACK;

Verifying the hashes:

	$ head -c65536 /dev/zero | md5sum
	fcd6bcb56c1689fcef28b57c22475bad  -
	$ head -c65536 /dev/zero | sha256sum
	de2f256064a0af797747c2b97505dc0b9f3df0de4f489eac731c23ae9ca9cc31  -

[1] https://www.postgresql.org/docs/15/plpython.html

--
Erik






[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
[Index of Archives]     [Postgresql Jobs]     [Postgresql Admin]     [Postgresql Performance]     [Linux Clusters]     [PHP Home]     [PHP on Windows]     [Kernel Newbies]     [PHP Classes]     [PHP Databases]     [Postgresql & PHP]     [Yosemite]

  Powered by Linux