query patterns for multipass aggregating

Rob Nikander <rob.nikander@xxxxxxxxx> · Sun, 11 Nov 2018 11:20:23 -0500

Hi,

I’ve run into this pattern a few times, and I usually get a little confused. I’m wondering if there are some common solutions or techniques.

Simplified example:

I have tables `items`, `colors`, and `images`. Items have many colors, and many images.

I want a query to list items, with their colors and images. Each result row is an item, so the colors and images must be aggregated into arrays or json. 

If there were only one other table, it’s a simple join and group…

    select items.*, array_agg(color_name)
    from items join colors on items.id = colors.item_id
    group by items.id    

Now to get images too, my first try uses a CTE… 

    // Tables:
    // items (id, name, foo)
    // colors (item_id, color_name, color_foo)
    // images (item_id, image_file, image_foo)

    with items_with_colors as (
      // This is the same query as above 
      select items.*, array_agg(colors.color_name) as color_names
      from items
      join colors on items.id = colors.item_id
      group by items.id
    )
    // Same idea repeated but now joining to images table
    select items.*, array_agg(images.image_file) as image_files
    from items_with_colors items
    join images on items.id = images.item_id
    group by items.id, items.name, items.foo, items.color_names;  // mmm :(

One first problem is on the last line. I have to list out all the columns. In my real situation there are many more. Listing them is only an inconvenience, but I’m more worried that it is internally comparing all the columns, when really it could compare only `items.id`.

So… are there some other techniques that usually (or might) work better? It seems like the kind of thing that could run in parallel, if I wrote it right. 

Rob