Hi! This morning I booted and, what a horror, found bad superblock on /var! fsck -ing reported nothing, but mount said bad superblock. It's the best can happen after due day of project, but before finishing it, isn't? So I decided to switch to reiserfs, which has performance advantages too. After about fifth reboot I could mount /var, and copied it to a new partition together with root partition. And, terrible, I had the same problem with /usr/sbin/sshd startup, without the binary changes, according to a diff with a probably-good backup (who can be sure about after all these...). So the conclusion is that pssibly this has nothing to do with ext3. It's not openssh because I had problems with other files/dirs, too... Maybe it's evms? Maybe it's the kernel? It's a stock 2.4.19, only with evms and vserves patches. I don't think it's a distro problem... So sorry about talking about this on ext3 list! Thanks for all help! viktor more comments below... > > > > Seems interesting. > > I forgot to mention (yes, sorry, it's important piece of information), > > that I have RAID 1 (mirrored disks), so HW problem is less possible. > > And I have reiserfs partition on the mirror too, without any problem. > > Raid protects you against disk failures. It does not protect you from > cable problems causing data corruption, or your RAID controller going > insane. Unfortunately a lot of people seem to believe that just > because they have RAID, they are immune from hardware problems, and > then stop doing backups. I usually hear from them after they've > gotten screwed, and when they ask if I can perform miracles.... Yes, RAID is completly different than backup. RAID doesn't protect you of rm -fr / ;)) > > In any case, the scenario I described (a controller/cable problem, or > an incorrectly configured IDE DMA settings) are all still possible > with RAID; RAID does not help you prevent these sorts of problems. It's SW RAID-1, disks are on the same controller, but different buses / cables. Am I right, that in this case HW errors are *very* unlikely? That would mean that there are exactly the same bits of errors at exactly the same time on different cables/disks... > As far as your not noticing the problem with reiserfs that could be > because you've been lucky, and not noticed because the block addresses > causing the problem do not (yet) contain data. But the symptoms > you've described sound very much like hardware induced errors. > > > Anyway, do you have an idea how to test for HW errors? > > Well, if you have a scratch partition that's not being used, you can > try using the badblocks program. Try using the -w option, which will > do a read/write test. This doesn't do a random access test, so it > might not detect any problems, though. > > I'd suggest checking your internal cabling, and replacing the > controller cable if it looks dubious. Making everything is well > plugged in, too. > I use the most expensive, twisted, shielded, etc. cables, plugged well, at least visualy... Thanks for all answers! viktor _______________________________________________ Ext3-users@redhat.com https://listman.redhat.com/mailman/listinfo/ext3-users