On Thu, Nov 29, 2012 at 11:00:59AM +0400, Vyacheslav Dubeyko wrote: > Thank you. Got this issue again (sincerely, I'm a bit lost and don't remember if there was a patch for this?) > I think that it needs to use such tools for the issue investigation: > 1. echo t > /proc/sysrq-trigger should tell us what the flusher is > doing. Attached. > 2. The /proc/meminfo and /proc/sys/vm/dirty_ratio output for the issue > time can be also helpful. MemTotal: 2074936 kB MemFree: 696664 kB Buffers: 940 kB Cached: 1036912 kB SwapCached: 0 kB Active: 758376 kB Inactive: 448196 kB Active(anon): 168788 kB Inactive(anon): 452 kB Active(file): 589588 kB Inactive(file): 447744 kB Unevictable: 0 kB Mlocked: 0 kB HighTotal: 1187784 kB HighFree: 535312 kB LowTotal: 887152 kB LowFree: 161352 kB SwapTotal: 0 kB SwapFree: 0 kB Dirty: 24 kB Writeback: 0 kB AnonPages: 168732 kB Mapped: 44004 kB Shmem: 520 kB Slab: 164796 kB SReclaimable: 143208 kB SUnreclaim: 21588 kB KernelStack: 912 kB PageTables: 1380 kB NFS_Unstable: 0 kB Bounce: 0 kB WritebackTmp: 0 kB CommitLimit: 1037468 kB Committed_AS: 273168 kB VmallocTotal: 122880 kB VmallocUsed: 1136 kB VmallocChunk: 92356 kB HugePages_Total: 0 HugePages_Free: 0 HugePages_Rsvd: 0 HugePages_Surp: 0 Hugepagesize: 4096 kB DirectMap4k: 12280 kB DirectMap4M: 897024 kB dirty_ratio is 20 > 3. It makes sense to use blktrace for getting I/O traces during > flushing during sync command execution and near before issue was > occurred. I/O traces can be useful for analysis but this output can > contain huge info amount. So, I think first of all it needs to get > info about what flusher doing during eating CPU time. This seems to be "not fixed by sync", so if you can elaborate about this I can test it. Extra info: wloczykij ~ (: grep -vE '^(#|$)' /etc/nilfs_cleanerd.conf protection_period 3600 min_clean_segments 12% max_clean_segments 30% clean_check_interval 10 selection_policy timestamp # timestamp in ascend order nsegments_per_clean 2 mc_nsegments_per_clean 4 cleaning_interval 10 mc_cleaning_interval 1 retry_interval 60 use_mmap log_priority info I'm using vanilla kernel 3.6.10. Current snapshot and checkpoint state: wloczykij ~ # lscp -s CNO DATE TIME MODE FLG NBLKINC ICNT wloczykij ~ # lscp | head CNO DATE TIME MODE FLG NBLKINC ICNT 140088 2013-03-18 11:38:03 cp - 119 48230 140089 2013-03-18 11:48:05 cp - 106 48230 140090 2013-03-18 11:58:07 cp - 85 48230 140091 2013-03-18 12:05:43 cp - 119 48231 140092 2013-03-18 12:05:45 cp - 40 48226 140093 2013-03-18 12:15:46 cp - 320 48325 140094 2013-03-18 12:17:11 cp - 139 48225 140095 2013-03-18 12:17:15 cp - 42 48225 140096 2013-03-18 12:17:17 cp - 21 48225 wloczykij ~ # lscp | tail 142148 2013-03-28 12:59:38 cp - 29 49204 142149 2013-03-28 12:59:44 cp - 29 49204 142150 2013-03-28 13:00:20 cp - 138 49208 142151 2013-03-28 13:01:50 cp - 40 49205 142152 2013-03-28 13:02:23 cp - 29 49205 142153 2013-03-28 13:02:23 cp - 29 49205 142154 2013-03-28 13:02:27 cp - 29 49205 142155 2013-03-28 13:02:32 cp - 29 49205 142156 2013-03-28 13:03:02 cp - 58 49205 142157 2013-03-28 13:03:02 cp - 28 49205 Piotr Szymaniak. -- I ten smród. Diabli wiedzą, co tam gniło, w tym mięsiwie, ale Redowi wydało się, że sto tysięcy rozbitych cuchnących jaj wylanych na sto tysięcy cuchnących rybich łbów i zdechłych kotów nie może śmierdzieć tak, jak śmierdziała ta maź. -- Arkadij i Borys Strugaccy, „Piknik na skraju drogi”
PID TTY STAT TIME COMMAND 1 ? Ss 0:31 init [3] 2 ? S 0:00 [kthreadd] 3 ? S 1:02 [ksoftirqd/0] 5 ? S< 0:00 [kworker/0:0H] 6 ? S 0:00 [kworker/u:0] 7 ? S< 0:00 [kworker/u:0H] 8 ? S< 0:00 [khelper] 9 ? S 0:00 [kdevtmpfs] 10 ? S 0:00 [kworker/u:1] 158 ? S 0:00 [bdi-default] 160 ? S< 0:00 [kblockd] 239 ? S< 0:00 [ata_sff] 246 ? S 0:00 [khubd] 350 ? S< 0:00 [rpciod] 365 ? S 0:32 [kswapd0] 366 ? S 0:00 [fsnotify_mark] 368 ? S< 0:00 [crypto] 471 ? S< 0:00 [deferwq] 474 ? S 0:00 [scsi_eh_0] 475 ? S 1:03 [usb-storage] 488 ? S< 0:24 [kworker/0:1H] 497 ? S 0:12 [segctord] 589 ? Ss 0:00 /sbin/udevd --daemon 647 ? S 0:00 [scsi_eh_1] 657 ? S 0:00 [scsi_eh_2] 937 ? S 49157:04 [flush-8:0] 1046 ? S< 0:00 [kdmflush] 1066 ? S< 0:00 [kcryptd_io] 1067 ? S< 0:00 [kcryptd] 1094 ? S< 0:00 [kdmflush] 1095 ? S< 0:00 [kcryptd_io] 1096 ? S< 0:00 [kcryptd] 1225 ? Ssl 2:42 /sbin/apcupsd -f /etc/apcupsd/apcupsd.conf -P /var/run/apcupsd.pid 1240 ? S 0:00 supervising syslog-ng 1241 ? Ssl 0:30 /usr/sbin/syslog-ng --cfgfile /etc/syslog-ng/syslog-ng.conf --pidfile /var/run/syslog-ng.pid 1255 ? Ss 0:11 /usr/sbin/crond 1342 ? Ss 0:00 nginx: master process /usr/sbin/nginx -c /etc/nginx/nginx.conf 1344 ? S 1:16 nginx: worker process 1345 ? S 2:18 nginx: worker process 1346 ? S 2:15 nginx: worker process 1347 ? S 2:10 nginx: worker process 1359 ? Ss 2:54 /usr/sbin/ntpd -p /var/run/ntpd.pid 1442 ? Ss 0:15 /usr/libexec/postfix/master -w 1445 ? S 0:02 qmgr -l -t unix -u 1460 ? Ss 0:00 /usr/bin/rsync --daemon 1468 ? S 0:00 tlsmgr -l -t unix -u 1520 ? Ss 0:25 /sbin/nilfs_cleanerd -c /etc/nilfs_cleanerd.conf /dev/disk/by-uuid/f18e80b1-f3c1-49ec-baa5-39c0edc4c0b9 1525 ? S 0:00 [btrfs-genwork-1] 1526 ? S 0:19 [btrfs-submit-1] 1527 ? S 0:00 [btrfs-delalloc-] 1528 ? S 0:00 [btrfs-fixup-1] 1531 ? S 0:00 [btrfs-endio-met] 1533 ? S 0:00 [btrfs-freespace] 1535 ? S 0:01 [btrfs-cache-1] 1536 ? S 0:00 [btrfs-readahead] 1537 ? S 0:00 [btrfs-cleaner] 1538 ? S 3:45 [btrfs-transacti] 1541 tty1 Ss+ 0:00 /sbin/agetty 38400 tty1 linux 1542 tty2 Ss+ 0:00 /sbin/agetty 38400 tty2 linux 1543 tty3 Ss+ 0:00 /sbin/agetty 38400 tty3 linux 1544 tty4 Ss+ 0:00 /sbin/agetty 38400 tty4 linux 1545 tty5 Ss+ 0:00 /sbin/agetty 38400 tty5 linux 1546 tty6 Ss+ 0:00 /sbin/agetty 38400 tty6 linux 1600 ? Sl 3227:31 /usr/bin/tor -f /etc/tor/torrc --runasdaemon 1 --PidFile /var/run/tor/tor.pid 1669 ? Ss 5:37 SCREEN 1670 pts/1 Ssl+ 22:23 irssi 1671 pts/2 Ss+ 35:30 ekg2 -m 1672 pts/3 Ss+ 10:51 mutt -y 1834 ? Ss 2:30 SCREEN 1835 pts/4 Ss 0:00 -/bin/bash 1839 pts/4 T 1:01 ekg2 -m 1843 pts/5 Ss 0:00 -/bin/bash 1847 pts/5 Sl+ 25:16 irssi 2086 pts/8 Ss 0:00 -/bin/bash 2094 pts/8 S 0:00 su - 2097 pts/8 S 0:01 -su 2130 ? Ss 0:00 /usr/bin/svnserve --foreground --daemon --root=/home/services/svn 5330 ? Ss 0:00 /usr/sbin/sshd 5360 ? Sl 1:49 /usr/bin/monit 8080 ? Ss 0:00 /usr/sbin/privoxy --pidfile /var/run/privoxy.pid --user privoxy.privoxy /etc/privoxy/config 12034 ? S 0:00 [btrfs-endio-met] 12624 ? Ss 0:19 /usr/sbin/unbound -c /etc/unbound/unbound.conf 14098 pts/4 S+ 26:16 ekg2 -m 14977 ? Ss 0:00 sshd: barbaz [priv] 14981 ? S 0:00 sshd: barbaz@pts/0 14982 pts/0 Ss 0:00 -bash 14986 pts/0 S+ 0:00 screen -r 15113 ? S 0:00 [btrfs-worker-2] 15292 ? S 0:00 [btrfs-endio-wri] 15754 ? S 0:00 pickup -l -t unix -u 15980 ? S 0:00 [kworker/0:2] 16006 ? S 0:00 [kworker/0:0] 16022 ? Ss 0:00 sshd: foobar [priv] 16025 ? S 0:00 sshd: foobar@pts/6 16026 pts/6 Ss 0:00 -bash 16034 pts/6 S+ 0:00 screen -r 16150 ? Ss 0:00 /sbin/nilfs_cleanerd -n /dev/sda3 / 16160 ? S 0:00 [kworker/0:1] 16164 ? S 0:00 [btrfs-endio-wri] 16166 ? S 0:00 smtpd -n smtp -t inet -u -o stress= -s 2 16168 ? S 0:00 proxymap -t unix -u 16169 ? S 0:00 trivial-rewrite -n rewrite -t unix -u 16170 ? S 0:00 cleanup -z -t unix -u 16171 ? S 0:00 smtp -t unix -u 16172 ? S 0:00 bounce -z -n defer -t unix -u 16181 pts/8 R+ 0:00 ps ax 19174 ? Ss 0:00 ssh-agent 19200 ? Ss 1:06 gpg-agent --daemon 21519 ? S 0:07 [flush-btrfs-1] 27995 ? S 0:07 [btrfs-endio-2] 29537 ? S 0:00 [btrfs-delayed-m]
Attachment:
sysrq-trigger.bz2
Description: BZip2 compressed data
Attachment:
signature.asc
Description: Digital signature