Hello Last days we had some crash on redhat system related to high io, I dont see a evident cause so perhaps someone on the list can give some pointers where to look for. We have several sybase database running, and database backup process Seems there was a heavy io operations and sybase crashed We have atop running, where we can see sybase using ram, and dying, also on sar we see heavy io. No swap has used I search the logs for oom kill's but didn't found any The machine was very unresponsive when the issue occur, we have redhat cluster running and cluster token got lost from this node during this time. I'd like to know if something killed the app, and how to fine tune memory/disk access to prevent this output from atop and sar follow, thanks in advanced 28Ag 12:00:01 AM runq-sz plist-sz ldavg-1 ldavg-5 ldavg-15 04:10:01 AM 1 656 6.36 7.08 4.59 04:20:01 AM 3 656 6.59 6.47 5.52 04:30:01 AM 3 658 6.03 6.37 5.96 04:40:01 AM 3 658 8.30 8.03 6.96 04:50:01 AM 3 665 9.74 8.11 7.42 05:00:01 AM 3 664 4.44 6.53 7.28 05:10:02 AM 2 664 6.45 6.85 6.99 12:00:01 AM kbmemfree kbmemused %memused kbbuffers kbcached kbswpfree kbswpused %swpused kbswpcad 03:50:01 AM 6874984 17786660 72.12 303028 6609768 8193016 0 0.00 0 04:00:01 AM 6867236 17794408 72.15 309484 6610364 8193016 0 0.00 0 04:10:01 AM 3970096 20691548 83.90 358904 9419084 8193016 0 0.00 0 04:20:01 AM 3074460 21587184 87.53 369916 10284048 8193016 0 0.00 0 04:30:01 AM 1690900 22970744 93.14 381072 11656316 8193016 0 0.00 0 04:40:01 AM 711644 23950000 97.11 389804 12620860 8193016 0 0.00 0 04:50:01 AM 88152 24573492 99.64 229812 13402336 8192772 244 0.00 244 05:00:01 AM 76592 24585052 99.69 178196 13474908 8192772 244 0.00 0 05:10:02 AM 80212 24581432 99.67 174196 13468696 8192772 244 0.00 0 05:20:01 AM 6772944 17888700 72.54 74380 7027120 8192772 244 0.00 0 12:00:01 AM CPU %user %nice %system %iowait %steal %idle 03:50:01 AM all 2.29 0.00 2.73 0.88 0.00 94.10 04:00:01 AM all 2.31 0.00 2.77 0.93 0.00 93.99 04:10:01 AM all 5.46 0.00 5.57 10.47 0.00 78.49 04:20:01 AM all 7.18 0.00 6.33 6.76 0.00 79.73 04:30:01 AM all 5.88 0.00 5.90 6.87 0.00 81.35 04:40:01 AM all 5.16 0.00 5.77 8.09 0.00 80.98 04:50:01 AM all 4.88 0.00 5.35 7.82 0.00 81.95 05:00:01 AM all 4.59 0.00 5.29 7.95 0.00 82.17 05:10:02 AM all 4.99 0.00 5.64 7.09 0.00 82.28 05:20:01 AM all 4.36 0.00 4.50 6.52 0.00 84.62 05:30:01 AM all 0.01 0.00 0.02 0.13 0.00 99.84 tps Total number of transfers per second that were issued to physical devices. A transfer is an I/O request to a physical device. rtps Total number of read requests per second issued to physical devices. wtps Total number of write requests per second issued to physical devices. bread/s Total amount of data read from the devices in blocks per second. 512 bytes. bwrtn/s Total amount of data written to devices in blocks per second. 12:00:01 AM tps rtps wtps bread/s bwrtn/s 1:10:01 AM 7996.72 321.85 7674.87 9120.26 187734.23 01:20:01 AM 7260.18 239.13 7021.05 7151.23 129573.36 01:30:01 AM 10541.89 333.87 10208.02 3724.24 260710.61 01:40:01 AM 8298.28 59.59 8238.69 949.20 200802.65 01:50:01 AM 12443.27 30.84 12412.43 141.97 310002.60 02:00:01 AM 10878.28 134.28 10744.00 3120.10 252048.75 02:10:01 AM 7191.81 245.43 6946.39 4632.28 155588.73 03:40:01 AM 9971.54 2.17 9969.37 9.80 194855.04 03:50:01 AM 9615.50 3.80 9611.70 16.28 195641.46 04:00:01 AM 10615.95 2.11 10613.84 9.04 201471.62 04:10:01 AM 10579.62 3619.13 6960.49 130869.08 143336.99 04:20:01 AM 10598.33 2381.38 8216.94 98763.15 176191.20 04:30:01 AM 11677.21 3535.63 8141.58 233305.68 174777.83 04:40:01 AM 10890.26 3150.32 7739.95 112731.87 170365.89 04:50:01 AM 12778.49 4054.31 8724.18 137983.84 184082.99 05:00:01 AM 13507.21 3830.58 9676.63 104143.93 203367.17 05:10:02 AM 12670.16 2266.25 10403.91 103696.19 210170.15 05:20:01 AM 5380199.36 6489408.45 6061020.12 6428917.69 3338784.18 05:30:01 AM 33.70 0.11 33.59 1.55 373.81 ATOP - dc2-x6270-m 2011/08/28 05:15:02 --x--- 5m0s elapsed PRC | sys 6m29s | user 5m58s | | #proc 525 | #trun 4 | #tslpi 649 | #tslpu 1 | #zombie 0 | clones 5264 | | #exit 3749 | CPU | sys 128% | user 120% | irq 4% | | idle 1968% | wait 179% | | steal 0% | guest 0% | curf 1.71GHz | curscal 58% | CPL | avg1 6.29 | avg5 7.04 | | avg15 7.07 | | | csw 1976989 | intr 1392566 | | | numcpu 24 | MEM | tot 23.5G | free 78.8M | cache 12.8G | dirty 0.3M | buff 170.0M | | slab 242.7M | | | | | SWP | tot 7.8G | free 7.8G | | | | | | | | vmcom 7.2G | vmlim 14.6G | PAG | scan 974496 | | stall 0 | | | | | swin 0 | | | swout 0 | LVM | b_yorick_dat | busy 98% | read 162041 | write 390 | KiB/r 17 | | KiB/w 3 | MBr/s 8.97 | MBw/s 0.00 | avq 6.69 | avio 1.81 ms | LVM | -dc2-tier1-d | busy 90% | read 2 | write 315874 | KiB/r 4 | | KiB/w 16 | MBr/s 0.00 | MBw/s 16.65 | avq 10.59 | avio 0.85 ms | LVM | c2-tier1-dp2 | busy 82% | read 2 | write 200145 | KiB/r 4 | | KiB/w 2 | MBr/s 0.00 | MBw/s 1.68 | avq 11.35 | avio 1.22 ms | LVM | c1-tier1-bp2 | busy 56% | read 86300 | write 9191 | KiB/r 16 | | KiB/w 13 | MBr/s 4.56 | MBw/s 0.41 | avq 8.12 | avio 1.75 ms | LVM | -dc1-tier1-b | busy 56% | read 86300 | write 9191 | KiB/r 16 | | KiB/w 13 | MBr/s 4.56 | MBw/s 0.41 | avq 8.12 | avio 1.75 ms | MDD | md2 | busy 0% | read 16 | write 3115 | KiB/r 5 | | KiB/w 4 | MBr/s 0.00 | MBw/s 0.04 | avq 0.00 | avio 0.00 ms | DSK | sdp | busy 45% | read 2 | write 150283 | KiB/r 4 | | KiB/w 16 | MBr/s 0.00 | MBw/s 8.26 | avq 9.92 | avio 0.90 ms | DSK | sdf | busy 45% | read 0 | write 150323 | KiB/r 0 | | KiB/w 17 | MBr/s 0.00 | MBw/s 8.39 | avq 9.47 | avio 0.90 ms | DSK | sds | busy 28% | read 37748 | write 1237 | KiB/r 19 | | KiB/w 42 | MBr/s 2.36 | MBw/s 0.17 | avq 5.44 | avio 2.18 ms | NET | transport | tcpi 263485 | tcpo 109968 | udpi 1031 | udpo 679 | tcpao 13 | tcppo 2 | tcprs 10 | tcpie 0 | tcpor 0 | udpip 0 | NET | network | ipi 269371 | ipo 110682 | ipfrw 0 | deliv 269302 | | | | | icmpi 48 | icmpo 25 | NET | eth1 27% | pcki 257856 | pcko 681449 | si 460 Kbps | so 27 Mbps | coll 0 | mlti 4 | erri 0 | erro 0 | drpi 0 | drpo 0 | NET | eth5 0% | pcki 15501 | pcko 0 | si 26 Kbps | so 0 Kbps | coll 0 | mlti 0 | erri 30598 | erro 0 | drpi 0 | drpo 0 | NET | eth3 0% | pcki 69250 | pcko 29053 | si 1354 Kbps | so 42 Kbps | coll 0 | mlti 59 | erri 0 | erro 0 | drpi 0 | drpo 0 | PID RUID EUID THR SYSCPU USRCPU VGROW RGROW RDDSK WRDSK ST EXC S CPUNR CPU CMD 1/95 5362 sybase sybase 7 3m25s 94.35s 0K 0K 924.9M 119.3M -- - R 8 100% dataserver 6172 sybase sybase 7 2m39s 2m09s 0K 0K 248K 474.2M -- - S 1 96% dataserver 13489 sybase sybase 1 1.92s 1m45s 0K 0K 0K 0K -- - D 19 36% sybmultbuf 6970 sybase sybase 7 17.89s 27.88s 0K 23164K 26K 26340K -- - S 22 15% dataserver 3215 root root 1 2.36s 0.00s 0K 0K 0K 0K -- - S 9 1% kmirrord 13490 sybase sybase 1 0.97s 0.43s -70.0M -56K 1.7G 0K -- - S 7 0% sybmultbuf 4861 sybase sybase 7 0.81s 0.44s 0K 0K 0K 54K -- - S 4 0% dataserver 3256 root root 1 0.88s 0.00s 0K 0K 0K 0K -- - S 9 0% kmirrord -- This message has been scanned for viruses and dangerous content by MailScanner, and is believed to be clean. -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list