Simon Riggs <simon@xxxxxxxxxxxxxxx> writes: > On 18 October 2016 at 19:34, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: >> If you don't want to have an implicit bias towards earlier blocks, >> I don't think that either standard tablesample method is really what >> you want. >> >> The contrib/tsm_system_rows tablesample method is a lot closer, in >> that it will start at a randomly chosen block, but if you just do >> "tablesample system_rows(1)" then you will always get the first row >> in whichever block it lands on, so it's still not exactly unbiased. > Is there a reason why we can't fix the behaviours of the three methods > mentioned above by making them all start at a random block and a > random item between min and max? The standard tablesample methods are constrained by other requirements, such as repeatability. I am not sure that loading this one on top of that is a good idea. The bias I referred to above is *not* the fault of the sample methods, rather it's the fault of using "LIMIT 1". It does seem like maybe it'd be nice for tsm_system_rows to start at a randomly chosen entry in the first block it visits, rather than always dumping that entire block. Then "tablesample system_rows(1)" would actually give you a pretty random row, and I think we aren't giving up any useful properties it has now. regards, tom lane -- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general