Re: Kernel panic in nilfs.

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi Zahid,

On Tue, 2013-01-15 at 14:36 -0800, Zahid Chowdhury wrote:
> Hello,
>   I am running a Centos 5.5 (kernel 2.6.18-194.17.4.el5). I have used the
> Centos distribution with the nilfs kernel module 2.0.22 to statically build
> nilfs into the kernel (that's why I renamed 2.6.18-194.17.4.el5 to 2.6.18-194.17.4.el5SSI_NILFS). I have enabled netconsole as the box is mostly headless - the kernel panic messages below came up through netconsole. The garbage collection daemon is nilfs-utils 2.1.0. The processor is a Intel(R) Atom(TM) CPU D510 dual core with 2 contexts. The SSD is a Industrial Grade Apacer 16GB SLC. At the time the kernel panicked there many (> 100) soft-real time processes with nice levels of -19 running (the cleanerd runs at +19 nice level as we have found that otherwise it disturbs the soft real-time processes). These soft real-time processes also are memory hogs & cpu hogs (less than < a few % idle even with all the cores/contexts), such that less than a few K of memory is available (we will be fixing the apps, but still nilfs should not panic the kernel) at anytime. We do allow overcommit and the all processes are at the normal oom_adj value of 0 except for critical processes like syslogd & klogd, sshd, crond, nilfs_cleanerd, ifplugd, dbus-daemon. Btw, we did much testing and no kernel panics occurred over weeks until I oom_adj the critical processes just today.
> 
> 
> Has anybody seen the kernel panic messages I see below? Is there any fix for this in a Centos 5.5 kernel? Would upgrading to a newer nilfs module clear up this panic? Would upgrading to a newer kernel clear up this panic? Upgrading cleanerd? Any other suggestions/questions are very welcome. Thanks all.
> 

First of all, I think that it makes sense to try to upgrade kernel and
nilfs-utils. It needs to understand that your issue can be reproduced on
actual state of NILFS2 code.

Secondly, what value of vm.min_free_kbytes do you have in your system?
Do you have in system log any error messages about page allocation
failure?

Thirdly, I don't clearly understand currently how to try to reproduce
your issue. Could you describe in more details what filesystem
operations were before issue occurrence? Do you have any NILFS2-related
error messages in your system log before kernel panic?

Thanks,
Vyacheslav Dubeyko.

> 
> Zahid
> 
> P.S.: Panic flow over netconsole into syslogd - sorry for so many lines, alas Solaris syslogd seems to wrap early:
> 
> Jan 15 12:22:38  ------------[ cut here ]------------
> Jan 15 12:22:38  kernel BUG at fs/nilfs2/page.c:317!
> Jan 15 12:22:38  invalid opcode: 0000 [#1]
> Jan 15 12:22:38  SMP
> Jan 15 12:22:38
> Jan 15 12:22:38  last sysfs file: /devices/pci0000:00/0000:00:1c.0/0000:02:00.0/irq
> Jan 15 12:22:38  Modules linked in:
> Jan 15 12:22:38   netconsole
> Jan 15 12:22:38   autofs4
> Jan 15 12:22:38   dme1737
> Jan 15 12:22:38   hwmon_vid
> Jan 15 12:22:38   hidp
> Jan 15 12:22:38   l2cap
> Jan 15 12:22:38   bluetooth
> Jan 15 12:22:38   sunrpc
> Jan 15 12:22:38   bridge
> Jan 15 12:22:38   ip_nat_ftp
> Jan 15 12:22:38   ip_conntrack_ftp
> Jan 15 12:22:38   ip_conntrack_netbios_ns
> Jan 15 12:22:38   iptable_mangle
> Jan 15 12:22:38   iptable_filter
> Jan 15 12:22:38   ipt_MASQUERADE
> Jan 15 12:22:38   xt_tcpudp
> Jan 15 12:22:38   iptable_nat
> Jan 15 12:22:38   ip_nat
> Jan 15 12:22:38   ip_conntrack
> Jan 15 12:22:38   nfnetlink
> Jan 15 12:22:38   ip_tables
> Jan 15 12:22:38   x_tables
> Jan 15 12:22:38   loop
> Jan 15 12:22:38   dm_mirror
> Jan 15 12:22:38   dm_multipath
> Jan 15 12:22:38   scsi_dh
> Jan 15 12:22:38   video
> Jan 15 12:22:38   backlight
> Jan 15 12:22:38   sbs
> Jan 15 12:22:38   power_meter
> Jan 15 12:22:38   hwmon
> Jan 15 12:22:38   i2c_ec
> Jan 15 12:22:38   dell_wmi
> Jan 15 12:22:38   wmi
> Jan 15 12:22:38   button
> Jan 15 12:22:38   battery
> Jan 15 12:22:38   asus_acpi
> Jan 15 12:22:38   ac
> Jan 15 12:22:38   lp
> Jan 15 12:22:38   snd_hda_intel
> Jan 15 12:22:38   snd_seq_dummy
> Jan 15 12:22:38   sg
> Jan 15 12:22:38   snd_seq_oss
> Jan 15 12:22:38   snd_seq_midi_event
> Jan 15 12:22:38   snd_seq
> Jan 15 12:22:38   snd_seq_device
> Jan 15 12:22:38   snd_pcm_oss
> Jan 15 12:22:38   snd_mixer_oss
> Jan 15 12:22:38   snd_pcm
> Jan 15 12:22:38   snd_timer
> Jan 15 12:22:38   snd_page_alloc
> Jan 15 12:22:38   parport_pc
> Jan 15 12:22:38   e1000e
> Jan 15 12:22:38   pcspkr
> Jan 15 12:22:38   snd_hwdep
> Jan 15 12:22:38   serio_raw
> Jan 15 12:22:38   parport
> Jan 15 12:22:38   i2c_i801
> Jan 15 12:22:38   i2c_core
> Jan 15 12:22:38   snd
> Jan 15 12:22:38   soundcore
> Jan 15 12:22:38   dm_raid45
> Jan 15 12:22:38   dm_message
> Jan 15 12:22:38   dm_region_hash
> Jan 15 12:22:38   dm_log
> Jan 15 12:22:38   dm_mod
> Jan 15 12:22:38   dm_mem_cache
> Jan 15 12:22:38   usb_storage
> Jan 15 12:22:38   ata_piix
> Jan 15 12:22:38   libata
> Jan 15 12:22:38   sd_mod
> Jan 15 12:22:38   scsi_mod
> Jan 15 12:22:38   ext3
> Jan 15 12:22:38   jbd
> Jan 15 12:22:38   uhci_hcd
> Jan 15 12:22:38   ohci_hcd
> Jan 15 12:22:38   ehci_hcd
> Jan 15 12:22:38
> Jan 15 12:22:38  CPU:    0
> Jan 15 12:22:38  EIP:    0060:[<c04c078b>]    Not tainted VLI
> Jan 15 12:22:38  EFLAGS: 00010246   (2.6.18-194.17.4.el5SSI_NILFS #1)
> Jan 15 12:22:38  EIP is at nilfs_copy_page+0x29/0x198
> Jan 15 12:22:38  eax: 80010029   ebx: c1329100   ecx: 00000000   edx: c135de00
> Jan 15 12:22:38  esi: 00000000   edi: f6df3f30   ebp: f6df3cf4   esp: f7a14ca8
> Jan 15 12:22:38  ds: 007b   es: 007b   ss: 0068
> Jan 15 12:22:38  Process nilfs_cleanerd (pid: 1653, ti=f7a14000 task=f79c4000 task.ti=f7a14000)
> Jan 15 12:22:38
> Jan 15 12:22:38  Stack:
> Jan 15 12:22:38  ec2e8000
> Jan 15 12:22:38  e0461000
> Jan 15 12:22:38  c135de00
> Jan 15 12:22:38  c1585d00
> Jan 15 12:22:38  f6df3f30
> Jan 15 12:22:38  c0458ba8
> Jan 15 12:22:38  c135de00
> Jan 15 12:22:38  c1329100
> Jan 15 12:22:38
> Jan 15 12:22:38
> Jan 15 12:22:38  f6df3f30
> Jan 15 12:22:38  f6df3cf4
> Jan 15 12:22:38  c04c0ff2
> Jan 15 12:22:38  00001f8e
> Jan 15 12:22:38  00000005
> Jan 15 12:22:38  00001f7c
> Jan 15 12:22:38  0000000e
> Jan 15 12:22:38  00000000
> Jan 15 12:22:38
> Jan 15 12:22:38
> Jan 15 12:22:38  c1407240
> Jan 15 12:22:38  c12b5ac0
> Jan 15 12:22:38  c152afe0
> Jan 15 12:22:38  c1462ae0
> Jan 15 12:22:38  c1408c20
> Jan 15 12:22:38  c135de00
> Jan 15 12:22:38  c11fdda0
> Jan 15 12:22:38  c1503320
> Jan 15 12:22:38
> Jan 15 12:22:38  Call Trace:
> Jan 15 12:22:38   [<c0458ba8>]
> Jan 15 12:22:38  find_lock_page+0x1a/0x7e
> Jan 15 12:22:38   [<c04c0ff2>]
> Jan 15 12:22:38  nilfs_copy_back_pages+0xbb/0x1e7
> Jan 15 12:22:38   [<c04d2f3b>]
> Jan 15 12:22:38  nilfs_commit_gcdat_inode+0x83/0xa8
> Jan 15 12:22:38   [<c04cc0de>]
> Jan 15 12:22:38  nilfs_segctor_complete_write+0x1dd/0x301
> Jan 15 12:22:38   [<c04cd337>]
> Jan 15 12:22:38  nilfs_segctor_do_construct+0x1011/0x1384
> Jan 15 12:22:38   [<c045dbea>]
> Jan 15 12:22:38  __set_page_dirty_nobuffers+0xb0/0xd3
> Jan 15 12:22:38   [<c04c17f3>]
> Jan 15 12:22:38  nilfs_mdt_mark_block_dirty+0x41/0x47
> Jan 15 12:22:38   [<c04cd8c1>]
> Jan 15 12:22:38  nilfs_segctor_construct+0x82/0x261
> Jan 15 12:22:38   [<c04ceada>]
> Jan 15 12:22:38  nilfs_clean_segments+0xa9/0x1c4
> Jan 15 12:22:38   [<c04d26e2>]
> Jan 15 12:22:38  nilfs_ioctl+0x444/0x57d
> Jan 15 12:22:38   [<c0465900>]
> Jan 15 12:22:38  free_pgd_range+0x108/0x190
> Jan 15 12:22:38   [<c04d229e>]
> Jan 15 12:22:38  nilfs_ioctl+0x0/0x57d
> Jan 15 12:22:38   [<c048620d>]
> Jan 15 12:22:38  do_ioctl+0x1c/0x5d
> Jan 15 12:22:38   [<c04867a1>]
> Jan 15 12:22:38  vfs_ioctl+0x47b/0x4d3
> Jan 15 12:22:38   [<c041eef6>]
> Jan 15 12:22:38  enqueue_task+0x29/0x39
> Jan 15 12:22:38   [<c0486841>]
> Jan 15 12:22:38  sys_ioctl+0x48/0x5f
> Jan 15 12:22:38   [<c0404f17>]
> Jan 15 12:22:38  syscall_call+0x7/0xb
> Jan 15 12:22:38   =======================
> Jan 15 12:22:38  Code:
> Jan 15 12:22:38  00
> Jan 15 12:22:38  c3
> Jan 15 12:22:38  55
> Jan 15 12:22:38  57
> Jan 15 12:22:38  56
> Jan 15 12:22:38  89
> Jan 15 12:22:38  ce
> Jan 15 12:22:38  53
> Jan 15 12:22:38  89
> Jan 15 12:22:38  c3
> Jan 15 12:22:38  83
> Jan 15 12:22:38  ec
> Jan 15 12:22:38  18
> Jan 15 12:22:38  89
> Jan 15 12:22:38  54
> Jan 15 12:22:38  24
> Jan 15 12:22:38  08
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  00
> Jan 15 12:22:38  f6
> Jan 15 12:22:38  c4
> Jan 15 12:22:38  10
> Jan 15 12:22:38  74
> Jan 15 12:22:38  08
> Jan 15 12:22:38  0f
> Jan 15 12:22:38  0b
> Jan 15 12:22:38  3b
> Jan 15 12:22:38  01
> Jan 15 12:22:38  22
> Jan 15 12:22:38  1b
> Jan 15 12:22:38  66
> Jan 15 12:22:38  c0
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  54
> Jan 15 12:22:38  24
> Jan 15 12:22:38  08
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  02
> Jan 15 12:22:38  f6
> Jan 15 12:22:38  c4
> Jan 15 12:22:38  08
> Jan 15 12:22:38  75
> Jan 15 12:22:38  08
> Jan 15 12:22:38  f>
> Jan 15 12:22:38  0b
> Jan 15 12:22:38  3d
> Jan 15 12:22:38  01
> Jan 15 12:22:38  22
> Jan 15 12:22:38  1b
> Jan 15 12:22:38  66
> Jan 15 12:22:38  c0
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  03
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  7c
> Jan 15 12:22:38  24
> Jan 15 12:22:38  08
> Jan 15 12:22:38  f6
> Jan 15 12:22:38  c4
> Jan 15 12:22:38  08
> Jan 15 12:22:38  8b
> Jan 15 12:22:38  6f
> Jan 15 12:22:38  0c
> Jan 15 12:22:38  75
> Jan 15 12:22:38
> Jan 15 12:22:38  EIP: [<c04c078b>]
> Jan 15 12:22:38  nilfs_copy_page+0x29/0x198
> Jan 15 12:22:38   SS:ESP 0068:f7a14ca8
> Jan 15 12:22:38
> Jan 15 12:22:38  Kernel panic - not syncing: Fatal exception
> Jan 15 12:22:38
> ~
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html


--
To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Filesystem Development]     [Linux BTRFS]     [Linux CIFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux SCSI]

  Powered by Linux