On 2019-03-22 13:40:28 +0100, Francisco Olarte wrote: > On Fri, Mar 22, 2019 at 11:22 AM Thomas Güttler > <guettliml@xxxxxxxxxxxxxxxxxx> wrote: > > Thank you for asking several times for a benchmark. > > I wrote it now and it is visible: inserting random bytes into bytea > > is much slower, if you use the psycopg2 defaults. > > Here is the chart: > > https://github.com/guettli/misc/blob/master/bench-bytea-inserts-postrgres.png > > And here is the script which creates the chart: > > https://github.com/guettli/misc/blob/master/bench-bytea-inserts-postrgres.py > > I'm not too sure, but I read ( in the code ) you are measuring a > nearly not compressible urandom data againtst a highly compressible ( > 'x'*i ) data, Yes, that seems to be the main difference. My "ascii" test creates random data in the range [32, 126], which is not very compressible, and I didn't see much of a difference to the full range (10th percentile and median were the same, 90th percentile was noticeably better). If I create "random" data in the range [120, 120], I also get a large speedup (about 3.5 times). Interestingly the difference vanishes for large (> 10 MB) blobs. > are you sure the difference is not due to data being compressed and > generating much less disk usage in toast-tables/wal? Yes, I think that's it: He is basically measuring how fast his CPU can compress a stream of constant bytes. Not very meaningful. Another difference I noticed between our benchmarks is that I used a plain bytes object while he used a psycopg2.Binary object. Those might be serialized differently, but since the speed difference is adequately explained by the (lack of) randomness, I am not going to investigate this. hp -- _ | Peter J. Holzer | we build much bigger, better disasters now |_|_) | | because we have much more sophisticated | | | hjp@xxxxxx | management tools. __/ | http://www.hjp.at/ | -- Ross Anderson <https://www.edge.org/>
Attachment:
signature.asc
Description: PGP signature