[Linux-cluster] Possible problem with different architectures

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi,

With help from the guys on #linux-cluster ( thanks guys :) ) I've managed to get a 3-node cluster running.

Two of the nodes are x86 and the third is an amd64 - all are running identical Gentoo installs on kernel 2.6.7.
All are running an up-to-date cvs /cluster.

I can successfully export a device from one x86 box to another, then format/mount a gfs on it on both x86 boxes - this works great.

However, I can't run gnbd_import on the amd64 box.  I get;

gnbd_import: /dev/gnbd/netdisc is not in use. deleting
gnbd_import: created gnbd device netdisc2
gnbd_monitor: gnbd_monitor started. Monitoring device #0
<gnbd_import does not return, Ctrl-C at this point>
gnbd_import: ERROR gnbd_recvd failed

It "looks" like gnbd_recvd is failing to complete a handshake, i.e. hanging half way through ..
.. Any suggestions welcome.

On another note, I've had a number of kernel crashes and I'm wondering looking at the logs whether it's because I'm running a preemtable kernel ... ?

Here are two sample crash dumps from syslog.. typically the machine goes D-state on the processes involved and won't shutdown cleanly ...

Crash #1 (x86 box):

Jul  3 22:44:48 rag CMAN: node squizzey.linux.co.uk is not responding - removing from the cluster
Jul  3 22:44:53 rag dlm: clvmd: recover event 2 (first)
Jul  3 22:44:53 rag dlm: clvmd: add nodes
Jul  3 22:44:53 rag Unable to handle kernel paging request at virtual address 0c000000
Jul  3 22:44:53 rag printing eip:
Jul  3 22:44:53 rag c013c2cb
Jul  3 22:44:53 rag *pde = 00000000
Jul  3 22:44:53 rag Oops: 0000 [#1]
Jul  3 22:44:53 rag PREEMPT
Jul  3 22:44:53 rag Modules linked in: gnbd gfs lock_dlm dlm cman lock_harness ohci_hcd e100 mii snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd uhci_hcd intel_agp agpgart st usb_storage scsi_mod ehci_hcd usbcore
Jul  3 22:44:53 rag CPU:    0
Jul  3 22:44:53 rag EIP:    0060:[<c013c2cb>]    Not tainted
Jul  3 22:44:53 rag EFLAGS: 00010292   (2.6.7)
Jul  3 22:44:53 rag EIP is at page_address+0xb/0xb0
Jul  3 22:44:53 rag eax: 0c000000   ebx: 0c000000   ecx: 00000000   edx: 18e0e600
Jul  3 22:44:53 rag esi: 18e0e600   edi: e0e600b8   ebp: e0e600e8   esp: e0e15e1c
Jul  3 22:44:53 rag ds: 007b   es: 007b   ss: 0068
Jul  3 22:44:53 rag Process dlm_recoverd (pid: 9579, threadinfo=e0e14000 task=e6542eb0)
Jul  3 22:44:53 rag Stack: 00000000 e0e60001 18e0e600 e0e600b8 e0e600e8 e85baee1 0c000000 e85c84b7
Jul  3 22:44:53 rag 18e0e600 18000000 00000018 e0e15ee0 00000002 00000002 e85bb3cf 00000002
Jul  3 22:44:53 rag 00000018 000000d0 e0e15e6c 00000000 00000000 00000018 e0e15ee0 00000002
Jul  3 22:44:53 rag Call Trace:
Jul  3 22:44:53 rag [<e85baee1>] lowcomms_get_buffer+0x81/0x150 [dlm]
Jul  3 22:44:53 rag [<e85bb3cf>] lowcomms_send_message+0x3f/0xf0 [dlm]
Jul  3 22:44:53 rag [<e85bccf4>] midcomms_send_message+0x44/0x70 [dlm]
Jul  3 22:44:53 rag [<e85c1621>] rcom_send_message+0xd1/0x210 [dlm]
Jul  3 22:44:53 rag [<e85c23f0>] gdlm_wait_status_low+0x60/0x90 [dlm]
Jul  3 22:44:53 rag [<e85bd07a>] nodes_reconfig_wait+0x2a/0x80 [dlm]
Jul  3 22:44:53 rag [<e85bd57f>] ls_nodes_init+0xbf/0x150 [dlm]
Jul  3 22:44:53 rag [<e85c31d2>] ls_first_start+0x62/0x160 [dlm]
Jul  3 22:44:53 rag [<e85c420d>] do_ls_recovery+0x1ed/0x430 [dlm]
Jul  3 22:44:53 rag [<e85c4593>] dlm_recoverd+0x143/0x180 [dlm]
Jul  3 22:44:53 rag [<c0114620>] default_wake_function+0x0/0x20
Jul  3 22:44:53 rag [<c0105c72>] ret_from_fork+0x6/0x14
Jul  3 22:44:53 rag [<c0114620>] default_wake_function+0x0/0x20
Jul  3 22:44:53 rag [<e85c4450>] dlm_recoverd+0x0/0x180 [dlm]
Jul  3 22:44:53 rag [<c0103f4d>] kernel_thread_helper+0x5/0x18
Jul  3 22:44:53 rag
Jul  3 22:44:53 rag Code: 8b 03 f6 c4 01 75 1e 8b 2d 8c 63 48 c0 29 eb c1 fb 05 c1 e3
Jul  3 22:44:53 rag ccsd[9560]: Error while processing get: No data available

Crash #2: (amd64)

Jul  3 21:42:28 squizzey dlm: clvmd: recover event 2 (first)
Jul  3 21:42:28 squizzey dlm: clvmd: add nodes
Jul  3 21:42:28 squizzey Unable to handle kernel NULL pointer dereference at 000000000000008a RIP:
Jul  3 21:42:28 squizzey <ffffffffa06b5dc6>{:dlm:send_to_sock+54}
Jul  3 21:42:28 squizzey PML4 3f7a9067 PGD b591067 PMD 0
Jul  3 21:42:28 squizzey Oops: 0000 [1] PREEMPT
Jul  3 21:42:28 squizzey CPU 0
Jul  3 21:42:28 squizzey Modules linked in: gnbd lock_dlm dlm cman gfs lock_harness dm_mod ipt_ttl ipt_limit ipt_state iptable_filter iptable_mangle ipt_LOG ipt_MASQUERADE ipt_TOS ipt_REDIRECT iptable_nat ipt_REJECT ip_tables ip_conntrack_irc ip_conntrack_ftp ip_conntrack nvidia usblp usbhid forcedeth ohci_hcd snd_intel8x0 snd_ac97_codec snd_mpu401_uart snd_rawmidi snd_seq_oss snd_seq_midi_event snd_seq snd_seq_device snd_pcm_oss snd_pcm snd_page_alloc snd_timer snd_mixer_oss snd usb_storage ehci_hcd usbcore
Jul  3 21:42:28 squizzey Pid: 31748, comm: dlm_sendd Tainted: P   2.6.7
Jul  3 21:42:28 squizzey RIP: 0010:[<ffffffffa06b5dc6>] <ffffffffa06b5dc6>{:dlm:send_to_sock+54}
Jul  3 21:42:28 squizzey RSP: 0018:00000100319b5ec8  EFLAGS: 00010202
Jul  3 21:42:28 squizzey RAX: 0000000000000002 RBX: ffffffffa06ca0f0 RCX: 00000100139c80c0
Jul  3 21:42:28 squizzey RDX: 0000000000000000 RSI: 00000000ffffffff RDI: 00000100139c80b8
Jul  3 21:42:28 squizzey RBP: 00000100139c80a8 R08: 00000100319b4000 R09: 0000000000000000
Jul  3 21:42:28 squizzey R10: 00000000ffffffff R11: 0000000000000000 R12: 0000010030d1d150
Jul  3 21:42:28 squizzey R13: 00000100139c80a8 R14: 0000000000000000 R15: 000000358cc16f78
Jul  3 21:42:28 squizzey FS:  000000358d80f640(0000) GS:ffffffff804f61c0(0000) knlGS:0000000000000000
Jul  3 21:42:28 squizzey CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Jul  3 21:42:28 squizzey CR2: 000000000000008a CR3: 0000000000101000 CR4: 00000000000006e0
Jul  3 21:42:28 squizzey Process dlm_sendd (pid: 31748, threadinfo 00000100319b4000, task 000001000676a000)
Jul  3 21:42:28 squizzey Stack: 0000007a319b5f08 00000100139c80b8 0000000000000a64 ffffffffa06ca0f0
Jul  3 21:42:28 squizzey 00000100139c80a8 0000010030d1d150 0000000000000005 00000100297df89c
Jul  3 21:42:28 squizzey 000000358cc16f78 ffffffffa06b637d
Jul  3 21:42:28 squizzey Call Trace:<ffffffffa06b637d>{:dlm:process_output_queue+157} <ffffffffa06b68b8>{:dlm:dlm_sendd+184}
Jul  3 21:42:28 squizzey <ffffffff8011126f>{child_rip+8} <ffffffffa06b6800>{:dlm:dlm_sendd+0}
Jul  3 21:42:28 squizzey <ffffffff80111267>{child_rip+0}
Jul  3 21:42:28 squizzey
Jul  3 21:42:28 squizzey Code: 48 8b 80 88 00 00 00 48 89 44 24 10 65 48 8b 04 25 18 00 00

--
Gareth Bult <Gareth@xxxxxxxxxx>

Attachment: signature.asc
Description: This is a digitally signed message part


[Index of Archives]     [Corosync Cluster Engine]     [GFS]     [Linux Virtualization]     [Centos Virtualization]     [Centos]     [Linux RAID]     [Fedora Users]     [Fedora SELinux]     [Big List of Linux Books]     [Yosemite Camping]

  Powered by Linux