I'm using ext4 on everything, but it's hard to judge which ext3 bugs might affect ext4 as well. I really don't have the ability to destructively test the array, I need all the data that's on it and I don't have enough spare space elsewhere to back it all up. You might see if you can trigger it with dd, writing to the drive directly w/no filesystem? Jim On 30 March 2010 14:45, Mark Knecht <markknecht@xxxxxxxxx> wrote: > Hi, > I am running the nvidia binary drivers. I'm not doing anything with > X at this point so I an just unload them I think. I could even remove > the card I suppose. > > I built a machine for my dad a couple of months ago that uses the > same 1TB WD drive that I am using now. I don't remember seeing > anything like this on his machine but I'm going to go check that. > > One other similarity I suspect we have is ext3? There were problems > with ext3 priority inversion in earlier kernel. It's my understanding > that they thought they had that worked out but possibly we're > triggering this somehow? since I've got a lot of disk space I can set > up some other partitions, etc4, reiser4, etc., and try copying files > to trigger it. However it's difficult for me if it requires read/write > as I'm not set up to really use the machine yet. Is that something you > have room to try? > > Also, we haven't discussed what drivers are loaded or kernel > config. Here's my current driver set: > > keeper ~ # lsmod > Module Size Used by > ipv6 207757 30 > usbhid 21529 0 > nvidia 10611606 22 > snd_hda_codec_realtek 239530 1 > snd_hda_intel 17688 0 > ehci_hcd 30854 0 > snd_hda_codec 45755 2 snd_hda_codec_realtek,snd_hda_intel > snd_pcm 58104 2 snd_hda_intel,snd_hda_codec > snd_timer 15030 1 snd_pcm > snd 37476 5 > snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer > soundcore 800 1 snd > snd_page_alloc 5809 2 snd_hda_intel,snd_pcm > rtc_cmos 7678 0 > rtc_core 11093 1 rtc_cmos > sg 23029 0 > uhci_hcd 18047 0 > usbcore 115023 4 usbhid,ehci_hcd,uhci_hcd > agpgart 24341 1 nvidia > processor 23121 0 > e1000e 111701 0 > firewire_ohci 20022 0 > rtc_lib 1617 1 rtc_core > firewire_core 36109 1 firewire_ohci > thermal 11650 0 > keeper ~ # > > - Mark > > On Tue, Mar 30, 2010 at 1:32 PM, Jim Duchek <jim.duchek@xxxxxxxxx> wrote: >> Hrm, I've never seen that kernel message. I don't think any of my >> freezes have lasted for up to 120 seconds though (my drives are half >> as big -- might matter?) It looks like we've both got WD drives -- >> and we both have nvidia 9500gt's as well. Are you running the nvidia >> binary drivers, or noveau? (It seems like it wouldn't matter >> especially as, at least on my system, they don't share an interrupt or >> anything, but I hate to ignore any hardware that we both have the same >> of). I did move to 2.6.33 for some time, but that didn't change the >> behaviour. >> >> Jim >> >> >> On 30 March 2010 13:05, Mark Knecht <markknecht@xxxxxxxxx> wrote: >>> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@xxxxxxxxx> wrote: >>> <SNIP> >>>> You're having this happen even if the disk in question is not in an >>>> array? If so perhaps it's an SATA issue and not a RAID one, and we >>>> should move this discussion accordingly. >>> >>> Yes, in my case the delays are so long - sometimes 2 or 3 minutes - >>> that when I tried to build the system using RAID1 I got this kernel >>> bug in dmesg. It's jsut info - not a real failure - but because it's >>> talking about long delays I gave up on RAID and tried a standard >>> single drive build. Turns out that it has (I think...) nothing to do >>> with RAID at all. you'll not that there are instructions for turning >>> the message off but I've not tried them. I intend to do a parallel >>> RAID1 build on this machine and be able to test both RAID vs non-RAID. >>> >>> - Mark >>> >>> INFO: task kjournald:17466 blocked for more than 120 seconds. >>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>> kjournald D ffff8800280bbe00 0 17466 2 0x00000000 >>> ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000 >>> ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878 >>> 0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08 >>> Call Trace: >>> [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1 >>> [<ffffffff8109c248>] ? sync_buffer+0x0/0x40 >>> [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a >>> [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40 >>> [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70 >>> [<ffffffff8109c248>] ? sync_buffer+0x0/0x40 >>> [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77 >>> [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23 >>> [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa >>> [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2 >>> [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b >>> [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e >>> [<ffffffff81134804>] ? kjournald+0xe3/0x206 >>> [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e >>> [<ffffffff81134721>] ? kjournald+0x0/0x206 >>> [<ffffffff81043591>] ? kthread+0x8b/0x93 >>> [<ffffffff8100bd3a>] ? child_rip+0xa/0x20 >>> [<ffffffff81043506>] ? kthread+0x0/0x93 >>> [<ffffffff8100bd30>] ? child_rip+0x0/0x20 >>> livecd ~ # >>> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html