On 10 March 2013 23:48, Javier Marcet <jmarcet@xxxxxxxxx> wrote: > Hi, > > I have been using what is my a 4x3TB Raid 5 rray for the last 8 months > without an issue but last week I got some recoverable reading errors. > Initially I forced an array check and it finished without problems but > the problem saw up again a day later. I remembered I saw a cable which > I thought I should replace the last time I had to open the server > case, but it was built into the case so I tried not to worry. > > At first I tried to reassemble the array after checking all the > connections inside the case and left it overnight. It should have > finished today by noon. Instead I was greeted by a bunch of traces > like this: > > 20614.984915] WARNING: at drivers/md/raid5.c:352 get_active_stripe+0x6bc/0x7c0() > [20614.984916] Hardware name: To Be Filled By O.E.M. > [20614.984916] Modules linked in: mt2063 drxk cx25840 cx23885 > btcx_risc videobuf_dvb tveeprom cx2341x videobuf_dma_sg r8169 > videobuf_core > [20614.984920] Pid: 10125, comm: kworker/u:0 Tainted: G W > 3.7.10-himawari #1 > [20614.984920] Call Trace: > [20614.984922] [<ffffffff810b8eaa>] warn_slowpath_common+0x7a/0xb0 > [20614.984923] [<ffffffff810b8ef5>] warn_slowpath_null+0x15/0x20 > [20614.984925] [<ffffffff8163278c>] get_active_stripe+0x6bc/0x7c0 > [20614.984926] [<ffffffff810e99de>] ? __wake_up+0x4e/0x70 > [20614.984928] [<ffffffff81659ec4>] ? md_wakeup_thread+0x34/0x60 > [20614.984929] [<ffffffff810ddac6>] ? prepare_to_wait+0x56/0x90 > [20614.984931] [<ffffffff816368aa>] make_request+0x1aa/0x6f0 > [20614.984932] [<ffffffff810dd850>] ? finish_wait+0x80/0x80 > [20614.984934] [<ffffffff8165b935>] md_make_request+0x105/0x260 > [20614.984935] [<ffffffff813b0e92>] generic_make_request+0xc2/0x110 > [20614.984937] [<ffffffff81644aea>] bch_generic_make_request_hack+0x9a/0xa0 > [20614.984938] [<ffffffff81644eb3>] bch_generic_make_request+0x43/0x190 > [20614.984939] [<ffffffff816479f8>] write_dirty+0x78/0x120 > [20614.984941] [<ffffffff810d597a>] process_one_work+0x13a/0x4f0 > [20614.984942] [<ffffffff81647980>] ? read_dirty_submit+0xe0/0xe0 > [20614.984944] [<ffffffff810d73c5>] worker_thread+0x165/0x480 > [20614.984946] [<ffffffff810d7260>] ? busy_worker_rebind_fn+0x110/0x110 > [20614.984947] [<ffffffff810dd0cb>] kthread+0xbb/0xc0 > [20614.984949] [<ffffffff810dd010>] ? flush_kthread_worker+0x70/0x70 > [20614.984950] [<ffffffff8188872c>] ret_from_fork+0x7c/0xb0 > [20614.984951] [<ffffffff810dd010>] ? flush_kthread_worker+0x70/0x70 > [20614.984952] ---[ end trace d2db072c18819bc0 ]--- > [20614.984954] sector=8b909ff8 i=2 (null) (null) > (null) (null) 1 > [20614.984955] ------------[ cut here ]------------ > > Thinking that it could still be a loose cable, I decided to order a > case more suited to host the raid (than the server case where the > drives share space with cards and cables). Meanwhile I left the drives > in such a way I could use reliable cables for the two with faulty > cables and tried to assemble the array again. > > Initially it didn't want to, and I was using mdadm --force. It started > to rebuild after a few seconds, though. To my dismay it ended the same > way. Only this time I went back through the logs and saw when was the > first back trace: http://bpaste.net/raw/82819/ > > Here is my raid.status: http://bpaste.net/raw/82820/ > > I have read all the info in > https://raid.wiki.kernel.org/index.php/RAID_Recovery#Restore_array_by_recreating_.28after_multiple_device_failure.29 > and before I lose any chance of copying the data (most of it at least) > trying to forcing a complete rebuild. > > I have 4.5 TB used and right now I have the filesystem mounted and I > can use it yet the kernel is spiting that same trace over and over > again. I really don't know what would be the best thing to do right > now and would appreciate any help. > > > -- > Javier Marcet <jmarcet@xxxxxxxxx> > -- > To unsubscribe from this list: send the line "unsubscribe linux-raid" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html So how are the drivers doing? smartctl -a for all HDDs please. Cheers, Mathias -- To unsubscribe from this list: send the line "unsubscribe linux-raid" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html