Hi, everyone I use kernel 2.6.15-rc7 and the latest STABLE cvs branch of GFS when the newest kernel is 2.6.15-rc7。 I've started a GFS cluster with 4 nodes, but after about 4 days, the cluster did not work.I found the /var/log/messages as follows: <-- Mar 28 15:31:29 nd05 kernel: d 1 locks Mar 28 15:31:29 nd05 kernel: gfs-sda1 update remastered resources Mar 28 15:31:29 nd05 kernel: gfs-sda1 updated 0 resources Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuild locks Mar 28 15:31:29 nd05 kernel: gfs-sda1 rebuilt 0 locks Mar 28 15:31:29 nd05 kernel: gfs-sda1 recover event 11 done Mar 28 15:31:29 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 8,11,11 Mar 28 15:31:29 nd05 kernel: gfs-sda1 process held requests Mar 28 15:31:29 nd05 kernel: gfs-sda1 processed 0 requests Mar 28 15:31:29 nd05 kernel: gfs-sda1 resend marked requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 11 finished Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 1,0,0 ids 11,11,11 Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,1,0 ids 11,14,11 Mar 28 15:31:30 nd05 kernel: gfs-sda1 move use event 14 Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 Mar 28 15:31:30 nd05 kernel: gfs-sda1 add node 2 Mar 28 15:31:30 nd05 kernel: gfs-sda1 total nodes 4 Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuild resource directory Mar 28 15:31:30 nd05 kernel: gfs-sda1 rebuilt 1552 resources Mar 28 15:31:30 nd05 kernel: gfs-sda1 purge requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 purged 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 mark waiting requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 marked 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 done Mar 28 15:31:30 nd05 kernel: gfs-sda1 move flags 0,0,1 ids 11,14,14 Mar 28 15:31:30 nd05 kernel: gfs-sda1 process held requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 processed 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 resend marked requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 resent 0 requests Mar 28 15:31:30 nd05 kernel: gfs-sda1 recover event 14 finished Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2 Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id 9190386 state 0 Mar 28 15:31:30 nd05 kernel: gfs-sda1 grant lock on lockqueue 2 Mar 28 15:31:30 nd05 kernel: gfs-sda1 process_lockqueue_reply id eab0065 state 0 Mar 28 15:31:30 nd05 kernel: gfs-sda1 unlock fb040350 no id Mar 28 15:31:30 nd05 kernel: recovery_done jid 3 msg 309 a Mar 28 15:31:30 nd05 kernel: 3961 recovery_done nodeid 4 flg 18 Mar 28 15:31:30 nd05 kernel: 3977 pr_start last_stop 3 last_start 4 last_finish 3 Mar 28 15:31:31 nd05 kernel: 3977 pr_start count 3 type 3 event 4 flags 21a Mar 28 15:31:31 nd05 kernel: 3977 pr_start 4 done 1 Mar 28 15:31:31 nd05 kernel: 3976 pr_finish flags 1a Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13415b4b id 163005c 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13425b42 id 180002f 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13435b39 id 1a00360 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13445b30 id 1760186 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13455b27 id 17a038b 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13465b1e id 15a01a8 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13475b15 id 1910380 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13485b0c id 1880309 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13495b03 id 17001e6 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134a5afa id 1940352 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134b5af1 id 1650349 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134c5ae8 id 167001d 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,134d5adf id 15c0083 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134e5ad6 id 1770155 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,134f5acd id 16400cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13505ac4 id 1680102 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13515abb id 1920051 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13525ab2 id 1850182 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13535aa9 id 17301cb 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13545aa0 id 17803ed 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13555a97 id 18a0111 3,0 Mar 28 15:31:31 nd05 kernel: 3977 rereq 3,13565a8e id 16d03c5 3,0 Mar 28 15:31:31 nd05 kernel: 3976 rereq 3,13575a85 id 1870026 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13585a7c id 185030b 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13595a73 id 15d0190 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135a5a6a id 14b03f1 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135b5a61 id 177025e 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135c5a58 id 198016f 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135d5a4f id 1640163 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,135e5a46 id 1730233 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,135f5a3d id 1880130 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13605a34 id 16f00aa 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13615a2b id 17400e1 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13625a22 id 16b03c1 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13635a19 id 16b03ad 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,13645a10 id 17e03d4 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,13655a07 id 18202c0 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136659fe id 170036c 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136759f5 id 155031c 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136859ec id 1660212 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136959e3 id 15c0114 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136a59da id 15a038f 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136b59d1 id 17600bb 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136c59c8 id 1a20336 3,0 Mar 28 15:31:32 nd05 kernel: 3976 rereq 3,136d59bf id 171003c 3,0 Mar 28 15:31:32 nd05 kernel: 3977 rereq 3,136e59b6 id 1500008 3,0 Mar 28 15:31:32 nd05 kernel: 3976 pr_start last_stop 4 last_start 9 last_finish 4 Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 4 type 2 event 9 flags 21a Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,136f59ad id 15e026f 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,137059a4 id 170017e 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1371599b id 16b01e3 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13725992 id 18000a2 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13735989 id 177017c 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13745980 id 16d035a 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13755977 id 18102d6 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,1376596e id 1740020 3,0 Mar 28 15:31:33 nd05 kernel: 3977 rereq 3,13775965 id 1780207 3,0 Mar 28 15:31:33 nd05 kernel: 3976 pr_start 9 done 1 Mar 28 15:31:33 nd05 kernel: 3976 pr_finish flags 1a Mar 28 15:31:33 nd05 kernel: 3976 pr_start last_stop 9 last_start 10 last_finish 9 Mar 28 15:31:33 nd05 kernel: 3976 pr_start count 3 type 3 event 10 flags 21a Mar 28 15:31:33 nd05 kernel: 3976 pr_start 10 done 1 Mar 28 15:31:33 nd05 kernel: 3977 pr_finish flags 1a Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,370232 id 23a010e 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,380229 id 2630143 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,390220 id 29f0338 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3a0217 id 2850133 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3b020e id 268035b 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3c0205 id 2710344 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3d01fc id 27701f4 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3e01f3 id 28203f7 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,3f01ea id 236011f 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4001e1 id 25e0387 3,0 Mar 28 15:31:33 nd05 kernel: 3976 rereq 3,4101d8 id 2810157 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4201cf id 248035a 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4301c6 id 24d0297 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4401bd id 2920280 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4501b4 id 267000b 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4601ab id 263012c 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4701a2 id 2930281 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,480199 id 28e028d 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,490190 id 243031a 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4a0187 id 259000d 3,0 Mar 28 15:31:34 nd05 kernel: 3976 rereq 3,4b017e id 2650370 3,0 Mar 28 15:31:35 nd05 kernel: 3976 pr_start last_stop 10 last_start 15 last_finish 10 Mar 28 15:31:35 nd05 kernel: 3976 pr_start count 4 type 2 event 15 flags 21a Mar 28 15:31:35 nd05 kernel: 3976 pr_start 15 done 1 Mar 28 15:31:35 nd05 kernel: 3976 pr_finish flags 1a Mar 28 15:31:35 nd05 kernel: Mar 28 15:31:35 nd05 kernel: lock_dlm: Assertion failed on line 357 of file /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c Mar 28 15:31:35 nd05 kernel: lock_dlm: assertion: "!error" Mar 28 15:31:35 nd05 kernel: lock_dlm: time = 79185725 Mar 28 15:31:35 nd05 kernel: gfs-sda1: error=-22 num=3,133b5b81 lkf=9 flags=84 Mar 28 15:31:35 nd05 kernel: Mar 28 15:31:37 nd05 kernel: ------------[ cut here ]------------ Mar 28 15:31:37 nd05 kernel: kernel BUG at /home/sunjw/projects/cluster.STABLE/gfs-kernel/src/dlm/lock.c:357! Mar 28 15:31:37 nd05 kernel: invalid operand: 0000 [#1] Mar 28 15:31:37 nd05 kernel: SMP Mar 28 15:31:37 nd05 kernel: Modules linked in: lock_dlm dlm cman gfs lock_harness ipmi_watchdog ipmi_si ipmi_poweroff ipmi_devintf ipmi_msgha ndler binfmt_misc dm_mirror dm_round_robin dm_multipath dm_mod video thermal processor fan button battery ac uhci_hcd usbcore hw_random shpchp pci_hotplug e1000 bonding qla2300 qla2xxx scsi_transport_fc sd_mod Mar 28 15:31:37 nd05 kernel: CPU: 1 Mar 28 15:31:37 nd05 kernel: EIP: 0060:[<f89e9556>] Not tainted VLI Mar 28 15:31:37 nd05 kernel: EFLAGS: 00010282 (2.6.15-rc7smp) Mar 28 15:31:37 nd05 kernel: EIP is at do_dlm_unlock+0x8f/0xa4 [lock_dlm] Mar 28 15:31:37 nd05 kernel: eax: 00000004 ebx: f560c180 ecx: f5cf7f10 edx: f89edf11 Mar 28 15:31:37 nd05 kernel: esi: ffffffea edi: f8a7f000 ebp: f8a61580 esp: f5cf7f0c Mar 28 15:31:37 nd05 kernel: ds: 007b es: 007b ss: 0068 Mar 28 15:31:37 nd05 kernel: Process gfs_glockd (pid: 3979, threadinfo=f5cf6000 task=f6735030) Mar 28 15:31:37 nd05 kernel: Stack: f89edf11 f8a7f000 f55517b0 f89e97f0 f560c180 f8a3c64f f560c180 00000003 Mar 28 15:31:37 nd05 kernel: f55517d4 f8a329d8 f8a7f000 f560c180 00000003 f55517b0 f8a61580 f55517b0 Mar 28 15:31:37 nd05 kernel: f8a7f000 f8a31f28 f55517b0 f55517b0 00000001 f8a31fdc d82c34c0 f55517b0 Mar 28 15:31:37 nd05 kernel: Call Trace: Mar 28 15:31:37 nd05 kernel: [<f89e97f0>] lm_dlm_unlock+0x19/0x20 [lock_dlm] Mar 28 15:31:37 nd05 kernel: [<f8a3c64f>] gfs_lm_unlock+0x2c/0x43 [gfs] Mar 28 15:31:37 nd05 kernel: [<f8a329d8>] gfs_glock_drop_th+0xe8/0x122 [gfs] Mar 28 15:31:37 nd05 kernel: [<f8a31f28>] rq_demote+0x76/0x92 [gfs] Mar 28 15:31:37 nd05 kernel: [<f8a31fdc>] run_queue+0x54/0xb5 [gfs] Mar 28 15:31:37 nd05 kernel: [<f8a320f4>] unlock_on_glock+0x1d/0x24 [gfs] Mar 28 15:31:37 nd05 kernel: [<f8a34013>] gfs_reclaim_glock+0xbd/0x135 [gfs] Mar 28 15:31:37 nd05 kernel: [<f8a28734>] gfs_glockd+0x3a/0xe3 [gfs] Mar 28 15:31:37 nd05 kernel: [<c0116f3d>] default_wake_function+0x0/0x12 Mar 28 15:31:37 nd05 kernel: [<c010328a>] ret_from_fork+0x6/0x14 Mar 28 15:31:37 nd05 kernel: [<c0116f3d>] default_wake_function+0x0/0x12 Mar 28 15:31:37 nd05 kernel: [<f8a286fa>] gfs_glockd+0x0/0xe3 [gfs] Mar 28 15:31:37 nd05 kernel: [<c0101ab5>] kernel_thread_helper+0x5/0xb Mar 28 15:31:37 nd05 kernel: Code: 73 34 ff 73 2c ff 73 08 ff 73 04 ff 73 0c 56 8b 03 ff 70 18 68 09 e0 9e f8 e8 ac 14 73 c7 83 c4 34 68 11 df 9e f8 e8 9f 14 73 c7 <0f> 0b 65 01 58 de 9e f8 68 13 df 9e f8 e8 23 0d 73 c7 5b 5e c3 --> What problem may be there? Thanks for any reply! Luckey -- Linux-cluster@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/linux-cluster