I just finished a long compile on my dad's i5-661/DH55HC machine which uses this same WD drive and I didn't spot any sign of this happening there. That's a very recent Intel chipset also and probably more or less the same SATA controller. I'm going to turn on the kernel message into dmesg thing for a while and see if anything pops up. I can set up some additional partitions on my local drive to test other file systems but since you're ext3 and I'm ext3 then it's not that unless the problem moved forward with code over time. I like the idea of using dd but I want to be careful about that sort of thing. I've not used dd before, but if I could tell it to write a gigabyte without messing up existing stuff then that could be helpful. Back later, Mark On Tue, Mar 30, 2010 at 1:59 PM, Jim Duchek <jim.duchek@xxxxxxxxx> wrote: > I'm using ext4 on everything, but it's hard to judge which ext3 bugs > might affect ext4 as well. I really don't have the ability to > destructively test the array, I need all the data that's on it and I > don't have enough spare space elsewhere to back it all up. You might > see if you can trigger it with dd, writing to the drive directly w/no > filesystem? > > Jim > > > > On 30 March 2010 14:45, Mark Knecht <markknecht@xxxxxxxxx> wrote: >> Hi, >> I am running the nvidia binary drivers. I'm not doing anything with >> X at this point so I an just unload them I think. I could even remove >> the card I suppose. >> >> I built a machine for my dad a couple of months ago that uses the >> same 1TB WD drive that I am using now. I don't remember seeing >> anything like this on his machine but I'm going to go check that. >> >> One other similarity I suspect we have is ext3? There were problems >> with ext3 priority inversion in earlier kernel. It's my understanding >> that they thought they had that worked out but possibly we're >> triggering this somehow? since I've got a lot of disk space I can set >> up some other partitions, etc4, reiser4, etc., and try copying files >> to trigger it. However it's difficult for me if it requires read/write >> as I'm not set up to really use the machine yet. Is that something you >> have room to try? >> >> Also, we haven't discussed what drivers are loaded or kernel >> config. Here's my current driver set: >> >> keeper ~ # lsmod >> Module Size Used by >> ipv6 207757 30 >> usbhid 21529 0 >> nvidia 10611606 22 >> snd_hda_codec_realtek 239530 1 >> snd_hda_intel 17688 0 >> ehci_hcd 30854 0 >> snd_hda_codec 45755 2 snd_hda_codec_realtek,snd_hda_intel >> snd_pcm 58104 2 snd_hda_intel,snd_hda_codec >> snd_timer 15030 1 snd_pcm >> snd 37476 5 >> snd_hda_codec_realtek,snd_hda_intel,snd_hda_codec,snd_pcm,snd_timer >> soundcore 800 1 snd >> snd_page_alloc 5809 2 snd_hda_intel,snd_pcm >> rtc_cmos 7678 0 >> rtc_core 11093 1 rtc_cmos >> sg 23029 0 >> uhci_hcd 18047 0 >> usbcore 115023 4 usbhid,ehci_hcd,uhci_hcd >> agpgart 24341 1 nvidia >> processor 23121 0 >> e1000e 111701 0 >> firewire_ohci 20022 0 >> rtc_lib 1617 1 rtc_core >> firewire_core 36109 1 firewire_ohci >> thermal 11650 0 >> keeper ~ # >> >> - Mark >> >> On Tue, Mar 30, 2010 at 1:32 PM, Jim Duchek <jim.duchek@xxxxxxxxx> wrote: >>> Hrm, I've never seen that kernel message. I don't think any of my >>> freezes have lasted for up to 120 seconds though (my drives are half >>> as big -- might matter?) It looks like we've both got WD drives -- >>> and we both have nvidia 9500gt's as well. Are you running the nvidia >>> binary drivers, or noveau? (It seems like it wouldn't matter >>> especially as, at least on my system, they don't share an interrupt or >>> anything, but I hate to ignore any hardware that we both have the same >>> of). I did move to 2.6.33 for some time, but that didn't change the >>> behaviour. >>> >>> Jim >>> >>> >>> On 30 March 2010 13:05, Mark Knecht <markknecht@xxxxxxxxx> wrote: >>>> On Tue, Mar 30, 2010 at 10:47 AM, Jim Duchek <jim.duchek@xxxxxxxxx> wrote: >>>> <SNIP> >>>>> You're having this happen even if the disk in question is not in an >>>>> array? If so perhaps it's an SATA issue and not a RAID one, and we >>>>> should move this discussion accordingly. >>>> >>>> Yes, in my case the delays are so long - sometimes 2 or 3 minutes - >>>> that when I tried to build the system using RAID1 I got this kernel >>>> bug in dmesg. It's jsut info - not a real failure - but because it's >>>> talking about long delays I gave up on RAID and tried a standard >>>> single drive build. Turns out that it has (I think...) nothing to do >>>> with RAID at all. you'll not that there are instructions for turning >>>> the message off but I've not tried them. I intend to do a parallel >>>> RAID1 build on this machine and be able to test both RAID vs non-RAID. >>>> >>>> - Mark >>>> >>>> INFO: task kjournald:17466 blocked for more than 120 seconds. >>>> "echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message. >>>> kjournald D ffff8800280bbe00 0 17466 2 0x00000000 >>>> ffff8801adf9d890 0000000000000046 0000000000000000 0000000000000000 >>>> ffff8801adcbde44 0000000000004000 000000000000fe00 000000000000c878 >>>> 0000000800000050 ffff88017a99aa40 ffff8801af90a150 ffff8801adf9db08 >>>> Call Trace: >>>> [<ffffffff812dd063>] ? md_make_request+0xb6/0xf1 >>>> [<ffffffff8109c248>] ? sync_buffer+0x0/0x40 >>>> [<ffffffff8137a4fc>] ? io_schedule+0x2d/0x3a >>>> [<ffffffff8109c283>] ? sync_buffer+0x3b/0x40 >>>> [<ffffffff8137a879>] ? __wait_on_bit+0x41/0x70 >>>> [<ffffffff8109c248>] ? sync_buffer+0x0/0x40 >>>> [<ffffffff8137a913>] ? out_of_line_wait_on_bit+0x6b/0x77 >>>> [<ffffffff810438b2>] ? wake_bit_function+0x0/0x23 >>>> [<ffffffff8109c637>] ? sync_dirty_buffer+0x72/0xaa >>>> [<ffffffff81131b8e>] ? journal_commit_transaction+0xa74/0xde2 >>>> [<ffffffff8103abcc>] ? lock_timer_base+0x26/0x4b >>>> [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e >>>> [<ffffffff81134804>] ? kjournald+0xe3/0x206 >>>> [<ffffffff81043884>] ? autoremove_wake_function+0x0/0x2e >>>> [<ffffffff81134721>] ? kjournald+0x0/0x206 >>>> [<ffffffff81043591>] ? kthread+0x8b/0x93 >>>> [<ffffffff8100bd3a>] ? child_rip+0xa/0x20 >>>> [<ffffffff81043506>] ? kthread+0x0/0x93 >>>> [<ffffffff8100bd30>] ? child_rip+0x0/0x20 >>>> livecd ~ # >>>> >>> >> > -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html