Actually the cluster checks are done via private network, so eth0 network loss should not have crashed the server. Do you see any logs in /var/crash? Is kdump/netdump setup? Can you post logs for ocssd (should be under grid directory) for the 10-15 minutes before the crash? Also post the /var/log/messages for 10-15 minutes prior to the crash. On Thu, Jun 21, 2012 at 1:04 AM, Georgios Magklaras <georgios@xxxxxxxxxxxxx> wrote: > On 06/18/2012 08:44 AM, raj sourabh wrote: >> >> Jun 10 19:22:04 prddbs02 snmpd[5158]: Received SNMP packet(s) from UDP: >> [127.0.0.1]:17955 Jun 10 19:22:34 prddbs02 kernel: NETDEV WATCHDOG: eth0: >> transmit timed out Jun 10 19:22:34 prddbs02 kernel: bonding: bond0: link >> status definitely down for interface eth0, disabling it Jun 10 19:22:34 >> prddbs02 kernel: bonding: bond0: making interface eth2 the new active one. >> Jun 10 19:22:34 prddbs02 kernel: device eth2 entered promiscuous mode Jun > > Before the soft lockup, what exactly caused the the NETDEV WATCHDOG loose > eth0? > For the __smp_call_function_many lockup, there were many fixes between 5.5 > and 5.6 in relation to multipath and other third party drivers > that caused similar lookups. (why are you on 5.5 and not at least 5.6, which > kernel are you running on)? > > Best regards, > > -- > -- > George Magklaras PhD > RHCE no: 805008309135525 > > Senior Systems Engineer/IT Manager > Biotechnology Center of Oslo and > the Norwegian Center for Molecular Medicine > EMBnet TMPC Chair > > http://folk.uio.no/georgios > > > > >> 10 19:22:46 prddbs02 kernel: BUG: soft lockup - CPU#2 stuck for 10s! >> [multipathd:5060] Jun 10 19:22:46 prddbs02 kernel: CPU 2: Jun 10 19:22:46 >> prddbs02 kernel: Modules linked in: oracleacfs(PFU) oracleadvm(PFU) >> oracleoks(PU) autofs4 hidp smbus(U) ipmi_devintf ipmi_si ipmi_msghandler >> rfcomm l2cap bluetooth lockd sunrpc cpufreq_ondemand acpi_cpufreq >> freq_table bonding dm_round_robin dm_multipath scsi_dh video backlight sbs >> power_meter hwmon i2c_ec dell_wmi wmi button battery asus_acpi >> acpi_memhotplug ac ipv6 xfrm_nalgo crypto_api parport_pc lp parport joydev >> sr_mod cdrom i2c_i801 igb pcspkr i2c_core 8021q e1000e dca sg dm_raid45 >> dm_message dm_region_hash dm_mem_cache dm_snapshot dm_zero dm_mirror >> dm_log >> dm_mod lpfc(U) scsi_transport_fc ata_piix libata shpchp mptsas mptscsih >> mptbase scsi_transport_sas sd_mod scsi_mod ext3 jbd uhci_hcd ohci_hcd >> ehci_hcd Jun 10 19:22:46 prddbs02 kernel: Pid: 5060, comm: multipathd >> Tainted: PF M 2.6.18-194.el5 #1 Jun 10 19:22:46 prddbs02 kernel: RIP: >> 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] >> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: >> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: >> Pid: 5060, comm: multipathd Tainted: PF M 2.6.18-194.el5 #1 Jun 10 >> 19:22:46 >> prddbs02 kernel: RIP: 0010:[<ffffffff8007767a>] [<ffffffff8007767a>] >> __smp_call_function_many+0x9a/0xbc Jun 10 19:22:46 prddbs02 kernel: RSP: >> 0018:ffff8108e79a5bf8 EFLAGS: 00000297 Jun 10 19:22:46 prddbs02 kernel: >> RAX: 0000000000000006 RBX: 0000000000000007 RCX: 0000000000000000 Jun 10 >> 19:22:46 prddbs02 kernel: RDX: 00000000000000ff RSI: 00000000000000ff RDI: >> 00000000000000c0 Jun 10 19:22:46 prddbs02 kernel: RBP: 0000000000000000 >> R08: 0000000000000008 R09: 0000000000000038 Jun 10 19:22:46 prddbs02 >> kernel: R10: ffff8108e79a5b98 R11: 0000000000000000 R12: ffffffff80143e16 >> Jun 10 19:22:46 prddbs02 kernel: R13: 0000000000000003 R14: >> ffff810366ec2c58 R15: ffff81093da13340 Jun 10 19:22:46 prddbs02 kernel: >> FS: >> 000000004189d940(0063) GS:ffff81012071cec0(0000) knlGS:0000000000000000 >> Jun > > ... > >> Thanks for any help in advance :) >> >> Regards, >> Raj > > > > -- > redhat-list mailing list > unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe > https://www.redhat.com/mailman/listinfo/redhat-list -- redhat-list mailing list unsubscribe mailto:redhat-list-request@xxxxxxxxxx?subject=unsubscribe https://www.redhat.com/mailman/listinfo/redhat-list