Hi, On 2017-12-07 20:21:45 -0800, Alex Tokarev wrote: > I have a set of tables with fairly large number of columns, mostly int with > a few bigints and short char/varchar columns. I¹ve noticed that Postgres is > pretty slow at inserting data in such a table. I tried to tune every > possible setting: using unlogged tables, increased shared_buffers, etc; even > placed the db cluster on ramfs and turned fsync off. The results are pretty > much the same with the exception of using unlogged tables that improves > performance just a little bit. > I have made a minimally reproducible test case consisting of a table with > 848 columns, inserting partial dataset of 100,000 rows with 240 columns. On > my dev VM the COPY FROM operation takes just shy of 3 seconds to complete, > which is entirely unexpected for such a small dataset. I don't find this to be this absurdly slow. On my laptop loading with a development checkout this takes 1223.950 ms. This is 20mio fields parsed/sec, rows with 69mio fields/sec inserted. Removing the TRUNCATE and running the COPYs concurrently scales well to a few clients, and only stops because my laptop's SSD stops being able to keep up. That said, I do think there's a few places that could stand some improvement. Locally the profile shows up as: + 15.38% postgres libc-2.25.so [.] __GI_____strtoll_l_internal + 11.79% postgres postgres [.] heap_fill_tuple + 8.00% postgres postgres [.] CopyFrom + 7.40% postgres postgres [.] CopyReadLine + 6.79% postgres postgres [.] ExecConstraints + 6.68% postgres postgres [.] NextCopyFromRawFields + 6.36% postgres postgres [.] heap_compute_data_size + 6.02% postgres postgres [.] pg_atoi the strtoll is libc functionality triggered by pg_atoi(), something I've seen show up in numerous profiles. I think it's probably time to have our own optimized version of it rather than relying on libcs. That heap_fill_tuple(), which basically builds a tuple from the parsed datums, takes time somewhat proportional to the number of columns in the table seems hard to avoid, especially because this isn't something we want to optimize for with the price of making more common workloads with fewer columns slower. But there seems quite some micro-optimization potential. That ExecConstraints() shows up seems unsurprising, it has to walk through all the table's columns checking for constraints. We could easily optimize this so we have a separate datastructure listing constraints, but that'd be slower in the very common case of more reasonable numbers of columns. The copy implementation deserves some optimization too... > Here¹s a tarball with test schema and data: > http://nohuhu.org/copy_perf.tar.bz2; it¹s 338k compressed but expands to > ~50mb. Here¹s the result of profiling session with perf: > https://pastebin.com/pjv7JqxD Thanks! Greetings, Andres Freund