On Mon, Aug 29, 2011 at 5:42 PM, Tom Lane <tgl@xxxxxxxxxxxxx> wrote: > Lonni J Friedman <netllama@xxxxxxxxx> writes: >> I have several Linux-x68_64 based dedicated PostgreSQL servers where >> I'm experiencing significant swap usage growth over time. All of them >> have fairly substantial amounts of RAM (not including swap), yet the >> amount of swap that postgres is using ramps up over time and >> eventually hurts performance badly. In every case, simply restarting >> postgresql frees up all the swap in use (until it ramps up again >> later). > > If you're certain that it's restarting *postgres* that does it, and not > restarting your application or pgbouncer or some other code, then it > seems like you must have uncovered a memory leak someplace. We haven't > got nearly enough info here to diagnose it though. > > First thing I'd want to know is which process(es) exactly are bloating. > The top output you showed us is unhelpful for that since it just shows > them all as "postmaster" --- you'll need to match up the problem PIDs > with "ps auxww" output. Keep in mind also that top is pretty awful > about distinguishing a process's actual memory use (private memory) > from the portion of PG's shared memory that it happens to have touched. > What you need to pay attention to is RES minus SHR, not either number > alone. With shared buffers set as high as you've got it, you'll > probably not be able to be sure that a process is bloating until it's > eaten hundreds of megs of private space. In the past 18 hours, swap usage has nearly doubled on systemA: $ free -m total used free shared buffers cached Mem: 56481 56210 271 0 11 52470 -/+ buffers/cache: 3727 52753 Swap: 1099 35 1064 As a reminder, this is what it was yesterday afternoon (roughly 18 hours earlier): total used free shared buffers cached Mem: 56481 55486 995 0 15 53298 -/+ buffers/cache: 2172 54309 Swap: 1099 18 1081 Here's top sorted by memory usage: PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND 2250 postgres 20 0 13.5g 10g 10g S 0.0 19.0 4:50.17 postmaster 20427 postgres 20 0 13.6g 8.6g 8.5g D 3.6 15.5 6:15.53 postmaster 1861 postgres 20 0 13.5g 2.2g 2.2g D 7.2 3.9 0:44.34 postmaster 32275 postgres 20 0 13.5g 2.0g 2.0g S 0.0 3.7 0:32.52 postmaster 2444 postgres 20 0 13.5g 2.0g 2.0g S 0.0 3.6 0:21.22 postmaster 21632 postgres 20 0 13.5g 2.0g 1.9g S 0.0 3.6 1:48.76 postmaster 1870 postgres 20 0 13.5g 1.7g 1.7g S 0.0 3.1 0:18.70 postmaster 2383 postgres 20 0 13.5g 1.7g 1.7g S 0.0 3.1 0:13.23 postmaster 32280 postgres 20 0 13.5g 1.6g 1.6g S 0.0 2.9 0:20.97 postmaster 2378 postgres 20 0 13.5g 1.3g 1.3g S 0.0 2.4 0:15.36 postmaster 2367 postgres 20 0 13.5g 1.3g 1.3g S 0.0 2.4 0:15.71 postmaster 2396 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.3 0:09.33 postmaster 31794 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.2 0:14.68 postmaster 1891 postgres 20 0 13.5g 1.2g 1.2g S 0.0 2.1 0:06.49 postmaster 2435 postgres 20 0 13.5g 1.2g 1.1g S 0.0 2.1 0:16.38 postmaster 2370 postgres 20 0 13.5g 1.1g 1.1g S 0.0 2.0 0:08.51 postmaster 2411 postgres 20 0 13.5g 1.0g 1.0g S 0.0 1.9 0:09.18 postmaster 31782 postgres 20 0 13.5g 981m 967m S 0.0 1.7 0:12.33 postmaster 32314 postgres 20 0 13.5g 909m 891m S 0.0 1.6 0:09.07 postmaster 2395 postgres 20 0 13.5g 890m 876m S 0.0 1.6 0:07.16 postmaster 2381 postgres 20 0 13.5g 829m 815m S 0.0 1.5 0:05.30 postmaster 2213 postgres 20 0 13.5g 829m 817m S 0.0 1.5 0:04.40 postmaster 2421 postgres 20 0 13.5g 803m 790m S 0.0 1.4 0:05.11 postmaster 1866 postgres 20 0 13.5g 797m 784m S 0.0 1.4 0:08.10 postmaster 2371 postgres 20 0 13.5g 793m 781m S 0.0 1.4 0:05.88 postmaster 3460 postgres 20 0 13.5g 786m 773m S 0.0 1.4 0:06.17 postmaster 2418 postgres 20 0 13.5g 604m 594m S 0.0 1.1 0:01.69 postmaster 2425 postgres 20 0 13.5g 582m 570m S 0.0 1.0 0:02.67 postmaster 5863 postgres 20 0 13.5g 358m 347m D 0.0 0.6 0:01.90 postmaster 5865 postgres 20 0 13.5g 293m 283m S 0.0 0.5 0:01.70 postmaster 2243 postgres 20 0 13.5g 289m 288m S 0.0 0.5 0:07.01 postmaster 5862 postgres 20 0 13.5g 195m 189m S 0.0 0.3 0:00.48 postmaster 1890 postgres 20 0 13.5g 149m 145m S 0.0 0.3 0:06.02 postmaster 5852 postgres 20 0 13.5g 132m 124m S 0.0 0.2 0:01.04 postmaster 32316 postgres 20 0 13.5g 21m 19m S 0.0 0.0 0:00.60 postmaster 1876 postgres 20 0 13.5g 21m 19m S 0.0 0.0 0:00.16 postmaster 5866 postgres 20 0 13.5g 17m 15m S 0.0 0.0 0:00.01 postmaster 2251 postgres 20 0 13.5g 17m 16m S 0.0 0.0 0:57.19 postmaster 32328 postgres 20 0 13.5g 16m 13m S 0.0 0.0 0:00.20 postmaster 4893 postgres 20 0 13.5g 15m 13m S 0.0 0.0 0:00.15 postmaster 2426 postgres 20 0 13.5g 14m 12m S 0.0 0.0 0:00.15 postmaster 1878 postgres 20 0 13.5g 14m 12m S 0.0 0.0 0:00.12 postmaster 2408 postgres 20 0 13.5g 13m 10m S 0.0 0.0 0:00.06 postmaster 1896 postgres 20 0 13.5g 12m 10m S 0.0 0.0 0:00.13 postmaster 2427 postgres 20 0 13.5g 10m 8436 S 0.0 0.0 0:00.07 postmaster 2401 postgres 20 0 13.5g 8484 6412 S 0.0 0.0 0:00.03 postmaster 5864 postgres 20 0 13.5g 7740 5836 S 0.0 0.0 0:00.00 postmaster 5853 postgres 20 0 13.5g 7528 5652 S 0.0 0.0 0:00.00 postmaster I've attached the corresponding 'ps auxwww' output as well (since its rather wide, and it will likely get mangled inside the email body), but here it is too for those who don't want to deal with the attachment: ############## postgres 1861 2.1 4.1 14135092 2421300 ? Ds 08:37 0:45 postgres: lfriedman farm 127.0.0.1(34293) INSERT postgres 1866 0.3 1.4 14142616 816732 ? Ss 08:38 0:08 postgres: lfriedman nightly 127.0.0.1(56661) idle postgres 1870 0.9 3.1 14147660 1816576 ? Ss 08:38 0:18 postgres: lfriedman nightly 127.0.0.1(56665) idle postgres 1876 0.0 0.0 14134040 21892 ? Ss 08:38 0:00 postgres: lfriedman cbs 127.0.0.1(56670) idle postgres 1878 0.0 0.0 14134636 14880 ? Ss 08:38 0:00 postgres: lfriedman cbs 127.0.0.1(56672) idle postgres 1890 0.3 0.2 14134992 153224 ? Ss 08:40 0:06 postgres: lfriedman farm 127.0.0.1(56673) idle postgres 1891 0.3 2.1 14134852 1215668 ? Ss 08:40 0:06 postgres: lfriedman farm 127.0.0.1(56674) idle postgres 1896 0.0 0.0 14134612 13244 ? Ss 08:40 0:00 postgres: lfriedman cbs 127.0.0.1(56677) idle postgres 2213 0.2 1.4 14142356 849448 ? Ss 08:40 0:04 postgres: lfriedman nightly 127.0.0.1(56693) idle postgres 2243 0.0 0.5 14125816 296012 ? S Aug29 0:07 /usr/bin/postmaster -p 5432 -D /var/lib/pgsql/data postgres 2248 0.0 0.0 159892 908 ? Ss Aug29 0:00 postgres: logger process postgres 2250 0.3 19.0 14133028 11013112 ? Ss Aug29 4:50 postgres: writer process postgres 2251 0.0 0.0 14132476 17532 ? Ss Aug29 0:57 postgres: wal writer process postgres 2252 0.0 0.0 14133884 2428 ? Ss Aug29 0:09 postgres: autovacuum launcher process postgres 2253 0.0 0.0 161272 1872 ? Ss Aug29 0:33 postgres: stats collector process postgres 2367 0.8 2.3 14147948 1385192 ? Ss 08:41 0:15 postgres: lfriedman nightly 127.0.0.1(56695) idle postgres 2370 0.4 1.9 14147088 1151312 ? Ss 08:41 0:08 postgres: lfriedman nightly 127.0.0.1(56698) idle postgres 2371 0.3 1.4 14142216 814360 ? Ds 08:41 0:05 postgres: lfriedman nightly 127.0.0.1(56699) UPDATE postgres 2378 0.8 2.3 14142980 1387028 ? Ss 08:41 0:15 postgres: lfriedman nightly 127.0.0.1(56706) idle postgres 2381 0.2 1.4 14144484 849852 ? Ss 08:41 0:05 postgres: lfriedman nightly 127.0.0.1(56709) idle postgres 2383 0.7 3.1 14147196 1800300 ? Ss 08:41 0:13 postgres: lfriedman nightly 127.0.0.1(56711) idle postgres 2395 0.3 1.5 14142816 911604 ? Ss 08:41 0:07 postgres: lfriedman nightly 127.0.0.1(56722) idle postgres 2396 0.5 2.2 14144544 1302852 ? Ss 08:41 0:09 postgres: lfriedman nightly 127.0.0.1(56723) idle postgres 2401 0.0 0.0 14134040 8984 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56728) idle postgres 2408 0.0 0.0 14134824 13940 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56735) idle postgres 2411 0.4 1.8 14147688 1087016 ? Ss 08:41 0:09 postgres: lfriedman nightly 127.0.0.1(56738) idle postgres 2418 0.0 1.0 14141992 619484 ? Ss 08:41 0:01 postgres: lfriedman nightly 127.0.0.1(56745) idle postgres 2421 0.2 1.4 14142428 822400 ? Ss 08:41 0:05 postgres: lfriedman nightly 127.0.0.1(56748) idle postgres 2425 0.1 1.0 14142500 596420 ? Ss 08:41 0:02 postgres: lfriedman nightly 127.0.0.1(56752) idle postgres 2426 0.0 0.0 14134720 15896 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56753) idle postgres 2427 0.0 0.0 14134040 10536 ? Ss 08:41 0:00 postgres: lfriedman cbs 127.0.0.1(56754) idle postgres 2435 0.8 2.0 14147132 1210368 ? Ss 08:41 0:16 postgres: lfriedman nightly 127.0.0.1(56761) idle postgres 2444 1.1 3.5 14136592 2063556 ? Ss 08:42 0:21 postgres: lfriedman farm 127.0.0.1(56769) idle postgres 3460 0.5 1.3 14142640 805412 ? Ss 08:51 0:06 postgres: lfriedman nightly 127.0.0.1(37675) idle postgres 4893 0.0 0.0 14134520 16068 ? Ss 09:02 0:00 postgres: lfriedman inventory 127.0.0.1(50384) idle postgres 5852 0.6 0.2 14139168 135420 ? Ss 09:09 0:01 postgres: lfriedman nightly 127.0.0.1(39228) idle postgres 5853 0.0 0.0 14133968 7528 ? Ss 09:09 0:00 postgres: lfriedman nightly 127.0.0.1(39229) idle postgres 5862 0.3 0.3 14137720 200300 ? Ss 09:10 0:00 postgres: lfriedman nightly 127.0.0.1(39230) idle postgres 5863 1.5 0.6 14144120 367536 ? Ss 09:10 0:01 postgres: lfriedman nightly 127.0.0.1(39231) idle postgres 5864 0.0 0.0 14133968 7740 ? Ss 09:10 0:00 postgres: lfriedman nightly 127.0.0.1(39232) idle postgres 5865 1.7 0.5 14144176 338516 ? Ss 09:10 0:02 postgres: lfriedman nightly 127.0.0.1(39233) idle postgres 5866 0.0 0.0 14135036 18388 ? Ss 09:10 0:00 postgres: lfriedman nightly 127.0.0.1(39234) idle postgres 20427 0.6 15.5 14232764 8991924 ? Ss Aug29 6:16 postgres: autovacuum worker process nightly postgres 21632 0.1 3.5 14146368 2055856 ? Ss Aug29 1:48 postgres: lfriedman nightly 127.0.0.1(33664) idle postgres 31782 0.3 1.7 14142696 1005052 ? Ss 08:18 0:12 postgres: lfriedman nightly 127.0.0.1(39318) idle postgres 31794 0.4 2.1 14149300 1261472 ? Ss 08:18 0:14 postgres: lfriedman nightly 127.0.0.1(39330) idle postgres 32275 1.0 3.6 14147716 2125608 ? Ss 08:21 0:32 postgres: lfriedman nightly 127.0.0.1(39356) idle postgres 32280 0.6 2.9 14150556 1699160 ? Ss 08:21 0:20 postgres: lfriedman nightly 127.0.0.1(39361) idle postgres 32314 0.3 1.6 14146864 930840 ? Ss 08:22 0:09 postgres: lfriedman nightly 127.0.0.1(39395) idle postgres 32328 0.0 0.0 14134636 16396 ? Ss 08:22 0:00 postgres: lfriedman cbs 127.0.0.1(39409) idle ############## I'm not sure if there's much to work with at this point, since is only been 18 hours. Please let me know if I can get additional data, or if its better to wait several more days before capturing this data again. thanks!
Attachment:
ps-auxww.out
Description: Binary data
-- Sent via pgsql-general mailing list (pgsql-general@xxxxxxxxxxxxxx) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-general