Will Hartung <willhartung@xxxxxxxxx> writes: > I am trying to load data in to a table with a jsonb field that is indexed > as gin (field jsonb_path_ops). > > It's a large table, and I'm loading it 100K rows at a time. Eventually, it > would be 30M+ rows in the table. > > Originally I simply loaded the table and then tried to create the index, > but it never finished. > > So, I'm trying to load it incrementally. > > I have 2.8M rows in the table so far, the jsonb field size is, on average, > 1600 bytes, with the largest (of the 2.8M loaded) 1930. Simply, these are > not large structures. > > The first batches to load took various times for each file. Most of them < > 1m, some took 1/2 hr. > > The current file is "stuck", pushing past 20hrs so far. > > The VM only has 4G of RAM, it is certainly "busy", but it is not swapping > (not at the OS level). > > Here is a recent top: > > top - 11:34:01 up 1 day, 1:49, 2 users, load average: 5.84, 4.94, 4.52 > Tasks: 103 total, 1 running, 59 sleeping, 0 stopped, 0 zombie > %Cpu(s): 0.0 us, 1.0 sy, 0.0 ni, 0.0 id, 95.3 wa, 0.0 hi, 3.7 si, > 0.0 st > KiB Mem : 4040212 total, 152336 free, 181792 used, 3706084 buff/cache > KiB Swap: 4194300 total, 4189948 free, 4352 used. 3443628 avail Mem > > Postgres is pretty much default configurations, I have not tweaked any of > the memory settings (such as work memory). > > My Mac OS host isn’t that busy either, but the VM adds some load, and it's > not thrashing. > > While I was loading the file in 100K row chunks, here are the times of each > respective chunk to actually load: > > 0:46s > 3:17s > 8:12s > 9:54s > 14:09s > 12:07s > 18:50s > 9:01s > 25:28s > 38:49s > 25:24s > 1:21s > 0:47s > 0:32s > 0:39s > 0:31s > 0:31s > 0:28s > 0:29s > 0:28s > 0:21s > 0:27s > 0:36s > 0:22s > 0:27s > 0:20s > 0:21s > 0:19s > 2:16:21s <— the last to date, but this was yesterday, now it's past 20hrs > > It stalled early, but then rocketed to the stalling cliff staring climax > that it's at now. > > The only reason this is on a Linux VM is that I saw similar behavior > running native Postgres 9.6 on Mac OS (the host). It didn’t make any sense > that Mac OS would be causing this, but, who knows. Try it and see. > > Since the start of the load of the stalled piece, something has consumed > over 800M of storage, I can’t say what, I did not check with any higher > fidelity as to where the storage was going. > > I do not understand why this suddenly falls over a cliff. The JSONs are not > large, so I don’t see how any individual one could crush the memory > subsystem. I have to assume that PG is somehow internally thrashing or > paging or something. I appreciate that the db is not tuned, but I would not > expect that it would struggle so to create this index, with these values, > and such small JSON payloads. > > Also, it’s not unique to the 29th piece. I’ve tried in the past to skip > those, and it still failed. This has been happening for some time (months), > but I keep putting it away. > > Any insight is helpful. My biggest fear is that for whatever reason we will > not be able to reload this table during any particular crisis in the future > should it come to that. > > Thanks. Which version of postgres? How are you loading the data? (application, psql, pg_restore) using (insert, copy)? -- Tim Cross