Re: query multiple schemas

Norbert Sándor <sandor.norbert@xxxxxxxxxxx> · Tue, 23 Apr 2024 11:08:26 +0200



    > using a dynamic number of schemas

    
    Although there are historical reasons behind our "schema per tenant"
    architecture, it provides very good logical separation of data, and
    is very convenient that we don't need to include the tenant ID in
    each query (I'm sure that it would cause lots of bugs and trouble).

    Besides, we use Hibernate and it
      has great support for this architecture.

    
    > In any case, I'm interested in what works well for you.

    
    I went with the dynamic solution I proposed in my original email
    (which performs a UNION of the queried table from all tenants).

    Performance is currently not a priority in our use-cases, otherwise
    I would have probably chosen a more static solution (which wouldn't
    be easy to maintain continuously, I'm sure of).

    
    I further simplified its usage, so finally it is fairly comfortable
    to use either directly:

    select tenantId, (record).*

        from tenant_query(null::mytable)

        where (record).type=2

        order by tenantId, (record).name;

    
    or by using a temporary view:

    create temp view
        all_tenant_mytable as

        select tenantId, (record).* from tenant_query(null::mytable);

        
        select *

        from all_tenant_mytable

        where type=2

        order by tenantId, name;

    
    > In my case, the revision/version of the schema could be
      different as well

    
    This complicates things very much, it is probably not possible to
    implement a solution as comfortable as in my case.

    
    Although I think the JSON-based solutions proposed in this thread
    would work in your case as well.

    In my current solution I also use JSON as intermediate
    representation - although I feel it is because of my lack
    of deeper knowledge of Postgresql's type system.

    The difference is that you would need to use JSON as the final
    representation, and reference the JSON fields using Postgres's
      JSON operators.

    
    > And if/when I get back to this issue myself, I'll do the
      same.

    
    My current solution is not much different than the one I posted in
    my original question.

    My main difficulty was the relatively static nature of Postgresql's
    type system, so this solution is a result of lots of trial-and-error
    rounds :)

    Take a look at it, and you (and maybe others) may have
    recommendations, e.g. how I could get rid of the usage of the
    intermediate JSON "layer".

    (Although I have to admit: it is amazing that it is possible to
    implement this at all ;) )

    
        CREATE OR REPLACE FUNCTION tenant_query_json(tbl anyelement)
RETURNS setof json AS $func$
declare
	_select text;
begin
	_select := (select
	    string_agg(
	    format('select t.*, %L tenantId from %I.%I t', schema_name, schema_name, pg_typeof(tbl)),
	    E'\n' || ' union all ' || E'\n')
	    from (
		SELECT schema_name
		FROM information_schema.schemata
		where schema_name not in ('information_schema') and schema_name not like 'pg_%'
	  ) tenants
	  );

	return query execute 'select row_to_json(r) from (' || _select || ') as r';
END;
$func$ LANGUAGE plpgsql;


CREATE OR REPLACE FUNCTION tenant_query(tbl anyelement)
RETURNS table(tenantId text, record anyelement) AS $func$
begin
	return query
	select t.tenantId, t.rec
		from (
			select
				jr->>'tenantid' tenantId,
				json_populate_record(tbl, jr) rec
			from tenant_query_json(tbl) jr
		) t;
END;
$func$ LANGUAGE plpgsql;


    As you can see in my examples above, I use the tenant_query() function but in your case
    (if your schemas are different) something similar to tenant_query_json() may work better.

    
    --

    Norbi

    
    On 2024. 04. 23. 9:33, Dominique
      Devienne wrote:

    
        On Sun, Apr 21, 2024 at 11:12 PM Tom Lane <tgl@xxxxxxxxxxxxx>
          wrote:

        
          Steve
            Baldwin <steve.baldwin@xxxxxxxxx>
            writes:

            > If the number of tenant schemas is reasonably static,
            you could write a

            > plpgsql function to create a set of UNION ALL views

            
            Another idea is to build a partitioned table

          
          Hi Norbert. I asked a [similar question][1] a while back,
          and unfortunately didn't get any actionable input,
            perhaps
          because I already mentioned in my message the options
          proposed here so far. Seems like people like us, using a
          dynamic number of schemas, are outliers in database-land.
          

          In my case, the revision/version of the schema could be
          different as well, which would complicate the
            partitioning idea.
          

          In any case, I'm interested in what works well for you.
          And if/when I get back to this issue myself, I'll do the
            same.
          

          Thanks, --DD
          

          [1]: https://postgrespro.com/list/thread-id/2673517