Dear PostgreSQL experts, I have a database that records the EXIF data for a collection of photos. (EXIF is a method for embedding arbitary name-value data in a JPEG, and digital cameras typically use it to record things like exposure information.) My exif table looks something like this: photo_id | tag | value ----------+--------------------------+-------------------------- 1001 | DateTime | 2006:10:26 11:19:29 1001 | Orientation | top - left 1001 | PixelXDimension | 3888 1001 | PixelYDimension | 2592 1002 | DateTimeOriginal | 2006:10:26 13:34:06 1002 | Orientation | left - bottom 1002 | PixelXDimension | 3888 1002 | PixelYDimension | 2592 photo_id and tag together form the primary key. The data also includes many tags that I'm not currently interested in. From this I create a view containing only the tags of interest: photo_id | orientation | datetime | xsize | ysize ----------+---------------+---------------------+-------+------- 1001 | top - left | 2006:10:30 11:19:29 | 3888 | 2592 1002 | left - bottom | 2006:10:26 13:34:06 | 3888 | 2592 My first attempt did this in the obvious way by joining the exif table with itself once per column: create view photo_info_v as select photo_id, e1.value as orientation, e2.value as datetime, e3.value as xsize, e4.value as ysize from exif e1 join exif e2 using(photo_id) join exif e3 using(photo_id) join exif e4 using(photo_id) where e1.tag='Orientation' and e2.tag='DateTime' and e3.tag='PixelXDimension' and e4.tag='PixelYDimension'; This works well for one important query, where I find one photo's information from the view: photos=> explain analyse select * from photo_info_v where photo_id=1201; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..12.09 rows=1 width=60) (actual time=1.198..1.290 rows=1 loops=1) -> Nested Loop (cost=0.00..9.07 rows=1 width=46) (actual time=0.953..1.009 rows=1 loops=1) -> Nested Loop (cost=0.00..6.04 rows=1 width=32) (actual time=0.693..0.731 rows=1 loops=1) -> Index Scan using exif_pkey on exif e1 (cost=0.00..3.02 rows=1 width=18) (actual time=0.384..0.394 rows=1 loops=1) Index Cond: ((photo_id = 1201) AND (tag = 'Orientation'::text)) -> Index Scan using exif_pkey on exif e4 (cost=0.00..3.02 rows=1 width=18) (actual time=0.189..0.205 rows=1 loops=1) Index Cond: ((1201 = photo_id) AND (tag = 'PixelYDimension'::text)) -> Index Scan using exif_pkey on exif e3 (cost=0.00..3.02 rows=1 width=18) (actual time=0.186..0.194 rows=1 loops=1) Index Cond: ((1201 = photo_id) AND (tag = 'PixelXDimension'::text)) -> Index Scan using exif_pkey on exif e2 (cost=0.00..3.02 rows=1 width=18) (actual time=0.171..0.195 rows=1 loops=1) Index Cond: ((1201 = photo_id) AND (tag = 'DateTime'::text)) Total runtime: 3.064 ms However, I might just want one column from the view: photos=> explain analyse select orientation from photo_info_v where photo_id=1201; QUERY PLAN -------------------------------------------------------------------------------------------------------------------------------------- Nested Loop (cost=0.00..12.09 rows=1 width=14) (actual time=1.266..1.353 rows=1 loops=1) -> Nested Loop (cost=0.00..9.07 rows=1 width=18) (actual time=1.009..1.062 rows=1 loops=1) -> Nested Loop (cost=0.00..6.04 rows=1 width=18) (actual time=0.752..0.787 rows=1 loops=1) -> Index Scan using exif_pkey on exif e1 (cost=0.00..3.02 rows=1 width=18) (actual time=0.410..0.418 rows=1 loops=1) Index Cond: ((photo_id = 1201) AND (tag = 'Orientation'::text)) -> Index Scan using exif_pkey on exif e4 (cost=0.00..3.02 rows=1 width=4) (actual time=0.183..0.199 rows=1 loops=1) Index Cond: ((1201 = photo_id) AND (tag = 'PixelYDimension'::text)) -> Index Scan using exif_pkey on exif e3 (cost=0.00..3.02 rows=1 width=4) (actual time=0.168..0.176 rows=1 loops=1) Index Cond: ((1201 = photo_id) AND (tag = 'PixelXDimension'::text)) -> Index Scan using exif_pkey on exif e2 (cost=0.00..3.02 rows=1 width=4) (actual time=0.168..0.191 rows=1 loops=1) Index Cond: ((1201 = photo_id) AND (tag = 'DateTime'::text)) Total runtime: 3.123 ms I only wanted the orientation information, which could be found using this faster query: photos=> explain analyse select value from exif where photo_id=1201 and tag='Orientation'; QUERY PLAN ----------------------------------------------------------------------------------------------------------------- Index Scan using exif_pkey on exif (cost=0.00..3.02 rows=1 width=14) (actual time=0.504..0.536 rows=1 loops=1) Index Cond: ((photo_id = 1201) AND (tag = 'Orientation'::text)) Total runtime: 1.298 ms Instead it builds up all four columns of the view and discards three of them. It has to do this, because if data for any one of the four columns did not exist then the whole row would not exist in the view. I know that all of the data is always present, but PostgreSQL doesn't know that, and I don't think there is any way that I can tell it that it does. To try and avoid this, I tried using a full outer join in the view definition. In this case the row would always be present in the view, even if the data for the other columns were not present. I hoped that the query would then be optimised to look up only the orientation information: create view photo_info_v as select photo_id, e1.value as orientation, e2.value as datetime, e3.value as xsize, e4.value as ysize from exif e1 full outer join exif e2 using(photo_id) full outer join exif e3 using(photo_id) full outer join exif e4 using(photo_id) where e1.tag='Orientation' and e2.tag='DateTime' and e3.tag='PixelXDimension' and e4.tag='PixelYDimension'; photos=> explain analyse select orientation from photo_info_v where photo_id=1201; QUERY PLAN ------------------------------------------------------------------------------------------------------------------------------------------------------------ Merge Join (cost=2431.73..3198.49 rows=7 width=14) (actual time=2256.194..2345.563 rows=1 loops=1) Merge Cond: ("outer".photo_id = "inner"."?column5?") Join Filter: (COALESCE("inner"."?column5?", "outer".photo_id) = 1201) -> Index Scan using exif_pkey on exif e4 (cost=0.00..740.62 rows=1229 width=4) (actual time=1.937..462.014 rows=1229 loops=1) Index Cond: (tag = 'PixelYDimension'::text) -> Sort (cost=2431.73..2434.99 rows=1302 width=26) (actual time=1766.523..1802.629 rows=1229 loops=1) Sort Key: COALESCE(COALESCE(e1.photo_id, e2.photo_id), e3.photo_id) -> Merge Join (cost=1601.15..2364.38 rows=1302 width=26) (actual time=931.551..1604.342 rows=1229 loops=1) Merge Cond: ("outer".photo_id = "inner"."?column4?") -> Index Scan using exif_pkey on exif e3 (cost=0.00..740.62 rows=1229 width=4) (actual time=0.240..519.953 rows=1229 loops=1) Index Cond: (tag = 'PixelXDimension'::text) -> Sort (cost=1601.15..1604.37 rows=1286 width=22) (actual time=931.092..971.617 rows=1229 loops=1) Sort Key: COALESCE(e1.photo_id, e2.photo_id) -> Merge Join (cost=0.00..1534.74 rows=1286 width=22) (actual time=0.467..797.332 rows=1229 loops=1) Merge Cond: ("outer".photo_id = "inner".photo_id) -> Index Scan using exif_pkey on exif e1 (cost=0.00..740.62 rows=1229 width=18) (actual time=0.193..271.369 rows=1229 loops=1) Index Cond: (tag = 'Orientation'::text) -> Index Scan using exif_pkey on exif e2 (cost=0.00..740.62 rows=1229 width=4) (actual time=0.118..395.892 rows=1229 loops=1) Index Cond: (tag = 'DateTime'::text) Total runtime: 2350.601 ms Ooops! It has got worse not better. The runtime has increased by three orders of magnitude, because it seems to compute most of the view before filtering. It boils down to this query: photos=> select e1.value as orientation photos-> from exif e1 photos-> full outer join exif e2 using(photo_id) photos-> full outer join exif e3 using(photo_id) photos-> full outer join exif e4 using(photo_id) photos-> where e1.tag='Orientation' photos-> and e2.tag='DateTime' photos-> and e3.tag='PixelXDimension' photos-> and e4.tag='PixelYDimension' photos-> and photo_id=1201; I would like it to optimise away the unused joins, leaving: photos=> explain analyse select e1.value as orientation photos-> from exif e1 photos-> where e1.tag='Orientation' photos-> and photo_id=1201; Is there a good reason why it cannot do this? Remeber that photo_id and tag are the primary key for exif, so the view cannot have more than one row per photo_id. This is not the first problem I have had with non-trivial views, where PostgreSQL has failed to simplify a query on the view as I had hoped it would, either because the semantics of SQL mean that it is unable to (with no way of describing the additional constraints that apply to that data and could make the simplification possible), or because the query optimiser doesn't detect a possible optimisation. Back in 7.4 days I resorted to materialised views updated using triggers, which involved enormously more work than a normal view. Has anything been done, or is anything planned, to make this easier? I imagine that the basic problem here, projecting data from multiple rows into a single row with multiple columns, is a fairly common one. Are there any tricks to doing this that anyone can share? I'm using PostgreSQL 8.1. Many thanks, Phil. (You are welcome to CC: me in any replies.)