Re: Row level security

Andrew Okhmat <andy@xxxxxxxxxxxxx> · Thu, 27 Jun 2024 12:45:04 +0200



    Hello Jim,
    

    Your approach of using table
      inheritance in PostgreSQL for implementing row-level security
      (RLS) has some interesting aspects, but there are potential
      pitfalls and alternatives that you should consider. Below, I'll
      outline some key points to 

      
      Table Inheritance and Performance Concerns

      
      * RLS and Inheritance -  In PostgreSQL, RLS policies are applied
      per table. If you use inheritance, RLS policies defined on the
      parent table won’t automatically apply to the child tables. You’ll
      have to set up RLS policies on each child table separately.

      
      * Growing Base Table  - The base table, getting a new row for
      every row inserted in the child tables, will grow really fast.
      Managing a table with hundreds of millions of rows per year could
      become a serious performance problem.

      
      * Partitioning - Partitioning can help manage big tables by
      breaking them into smaller parts. But if your base table becomes a
      bottleneck, partitioning the child tables alone might not solve
      the problem.

      
      Alternative Approach: Use Partitioned Tables Directly with RLS

      
      Given your needs, here's a different approach that leverages
      PostgreSQL's partitioning and indexing features along with RLS:

      
      * Directly Partitioned Tables - Instead of inheritance, create
      partitioned tables directly for each type of data. Partition these
      tables based on a logical key (like time, site ID, or customer ID)
      so each partition stays manageable:

      Example:

           CREATE TABLE data (

                 id SERIAL PRIMARY KEY,

                 site_id INT,

                 customer_id INT,

                 division_id INT,

                 department_id INT,

                 data_payload JSONB,

                 created_at TIMESTAMPTZ

             ) PARTITION BY RANGE (created_at);

      
      * RLS Policies on Partitions - Set up RLS policies on each
      partition. Since partitions are smaller, RLS policy checks should
      be more efficient.

      Example:

           CREATE POLICY rls_policy ON data

             USING (site_id =
        current_setting('app.current_site_id')::INT);

             ENABLE ROW LEVEL SECURITY;

      
      * Session Variables - Using PostgreSQL session variables to store
      user-specific info (like app.current_site_id) is
      convenient, but has potential security risks. If a client can set
      these variables, they could manipulate them to gain unauthorized
      access. To mitigate this, ensure that only trusted parts of your
      application can set these variables. Consider using server-side
      functions or application logic to securely set these variables
      based on the authenticated user's information.

      Example:

           SET app.current_site_id = '123';

      
      * Indexing  - Make sure you index columns used in RLS policies and
      queries, like site_id and customer_id.

           CREATE INDEX idx_site_id ON data
        (site_id);

             CREATE INDEX idx_customer_id ON data (customer_id);

             CREATE INDEX idx_site_customer ON data (site_id,
        customer_id);

             CREATE INDEX idx_created_at ON data (created_at);

      
      * Using Stored Procedures - using stored procedures can centralize
      security logic, but it can also add complexity. Here's a brief
      look:

      
      Advantages:

         - Centralized security logic.

         - Additional layer of security as logic is hidden from
      end-users.

         - Can include data validation and business logic.

      
      2Disadvantages:

         - Increased complexity in development and maintenance.

         - Potential performance overhead for complex procedures.

         - Less flexibility for ad-hoc queries.

      
      Example:

         CREATE OR REPLACE FUNCTION insert_data(

               p_site_id INT,

               p_customer_id INT,

               p_division_id INT,

               p_department_id INT,

               p_data_payload JSONB,

               p_created_at TIMESTAMPTZ

           ) RETURNS VOID AS $$

           BEGIN

               IF current_setting('app.current_site_id')::INT =
        p_site_id THEN

                   INSERT INTO data (site_id, customer_id, division_id,
        department_id, data_payload, created_at)

                   VALUES (p_site_id, p_customer_id, p_division_id,
        p_department_id, p_data_payload, p_created_at);

               ELSE

                   RAISE EXCEPTION 'Access Denied';

               END IF;

           END;

           $$ LANGUAGE plpgsql;

      
    * Final Thoughts

      
      Using direct partitioning and applying RLS policies to each
      partition should help with performance issues linked to a growing
      base table. This approach also keeps things flexible for future
      expansions and avoids the hassle of managing inheritance
      hierarchies. Proper indexing with RLS policies in mind can greatly
      improve query performance in large tables. Just make sure to
      handle session variables securely to avoid potential security
      issues.

      
      If you have more questions or need further advice on
      implementation, just let me know!

      
      Cheers,  

      Andy
    

    On 25-Jun-24 00:28, Thomas Simpson
      wrote:

    
      Hi,
      I'm trying to implement a system which requires row level
        security on some key data tables (most do not require RLS).  The
        data tables will grow substantially (rows likely > +100M/year
        - the system is > 80% data insert plus < 20% updates and
        by design, no deletes).
      Some queries are likely to brush past many rows before being
        eliminated by the RLS policy, so I'm trying to find the most
        efficient way that does not compromise query times.  I also want
        to have a unified approach across all the RLS data to make the
        policy implementation as straightforward as possible too because
        I know there will be future expansion of the RLS rules.
      My thought currently is that table inheritance could possibly
        be one way forward.  Specifically the base table holding just
        the RLS attributes, such as site group, site ID, customer group,
        customer ID as some initial examples (I expect company division,
        department may be future needs too).
      With the RLS attributes on the base table, I can add future
        needs to that table and they automatically propagate to the
        child tables holding the RLS data.  Policies on the child tables
        can enforce row visibility based on session tokens assigned at
        login (a future problem avoided just now for simplicity).
      I have a small prototype working, with the policy function
        comparing the columns (from the base table) to the user tokens
        to permit/deny row access.  This allows this to be as in-memory
        and hopefully as fast as possible as it avoids needing to do any
        lookups to other tables or anything more expensive than some
        'permissionColumn IN listOfTokensHeldByTheSession' checks.
      My concern is the base table will grow substantially faster
        than the child data tables as that receives a new row for every
        row inserted in any of the child tables, so could easily be
        +300M rows/year and this could become some performance fence. 
        Some of the child tables have a clear partition key available so
        inherited & partitioned is also appealing but could possibly
        amplify any performance issue further.
      Does this approach sound viable or are there pitfalls or a
        different more recommended approach?
      Thanks
      Jim