=?UTF-8?B?UMOzbCBVYSBMYW/DrW5lY2jDoWlu?= <linehanp@xxxxxx> writes: > The SQL: > WITH RECURSIVE rand (num, md, a_2_s) AS > ( > SELECT > 1, > MD5(RANDOM()::TEXT), > ARRAY_TO_STRING(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) > FROM GENERATE_SERIES(1, 5)), '') > UNION > SELECT num + 1, > MD5(RANDOM()::TEXT), > ARRAY_TO_STRING(ARRAY(SELECT chr((65 + round(random() * 25)) :: integer) > FROM GENERATE_SERIES(1, 5)), '') > FROM rand > WHERE num < 5 > ) > SELECT * FROM rand; > A typical result is shown below: > 1 974ee059a1902e5ca1ec73c91275984b GYXYS > 2 6cf5a974d5859eae23cdb9c310e3a3bf YFDPT > 3 fa6be95eb720fe6f80c7c8fb6ba11171 YFDPT > 4 fa54913b0bb43de0025b153fd71a5334 YFDPT > 5 523fab9bdc6c4c51a89e0d901273fb69 YFDPT > The fact that the last 4 are identical is not a coincidence. If I put > 100 in the GENERATE_SERIES, I still get the same result, the first and > second records are different, but ALL subsequent instances of the > ARRAY_TO_STRING are identical! Yeah, it's weird. A look at EXPLAIN VERBOSE offers some insight: QUERY PLAN ------------------------------------------------------------------------------------------------------------------------ CTE Scan on rand (cost=4.75..5.37 rows=31 width=68) Output: rand.num, rand.md, rand.a_2_s CTE rand -> Recursive Union (cost=0.13..4.75 rows=31 width=68) -> Result (cost=0.13..0.15 rows=1 width=68) Output: 1, md5((random())::text), array_to_string($1, ''::text) InitPlan 1 (returns $1) -> Function Scan on pg_catalog.generate_series (cost=0.00..0.13 rows=5 width=32) Output: chr((('65'::double precision + round((random() * '25'::double precision))))::integer) Function Call: generate_series(1, 5) -> WorkTable Scan on rand rand_1 (cost=0.13..0.40 rows=3 width=68) Output: (rand_1.num + 1), md5((random())::text), array_to_string($2, ''::text) Filter: (rand_1.num < 5) InitPlan 2 (returns $2) -> Function Scan on pg_catalog.generate_series generate_series_1 (cost=0.00..0.13 rows=5 width=32) Output: chr((('65'::double precision + round((random() * '25'::double precision))))::integer) Function Call: generate_series(1, 5) (17 rows) The ARRAY sub-selects are being done as initplans, not subplans, which means they're only done once not once per row. This is correct so far as the planner is concerned because those sub-selects are "uncorrelated", ie they use no values from the outer query. There is room to argue that because the sub-selects contain volatile functions, they ought not be optimized into initplans. We have traditionally not considered that, however, and I'm afraid that a lot of people have written queries that depend on it. For example, there's lore out there that replacing WHERE mycol < random() with WHERE mycol < (SELECT random()) will freeze the random() result as a single value rather than computing a new value for each row, which sometimes is what you need. These days, better practice would be to put the random() call in a CTE, but there's still a lot of code out there that does it as above. For your immediate problem, since you don't care that much (I suppose) about exactly how the strings are generated, you could fix the issue by making the sub-selects depend on "num" somehow. Or possibly there's a way to do it without a sub-select. On the whole this looks like a mighty expensive way to generate a random string, so I'd be inclined to look for other implementations. regards, tom lane