Can anybody here give me a hint about the problem? Particulary: > My question is: does the ext3 driver _ever_ write outside of its own > space on disk - i.e into 0x000-0x400? That is can we exclude with > certainity that it's _not_ the ext3 driver causing the problem? ? *t On 9/3/2007, "Tomas Pospisek ML" <tpo2@xxxxxxxxxxxxx> wrote: > >Hello everybody > >we're running a small population of lightly embedded machines with the >following specs: > >System: +- standard intel box >FS: ext3 (defaults,errors=remount-ro,noatime) >HD: TRANSCEND, ATA DISK drive, Compact Flash (CF), 2000880 sectors (1024 >MB) w/2KiB Cache, CHS=1985/16/63 >Driver: Standard IDE Driver > ICH4: chipset revision 2 > ICH4: not 100% native mode: will probe irqs later > ide0: BM-DMA at 0xf000-0xf007, BIOS settings: hda:pio, >hdb:pio > ide1: BM-DMA at 0xf008-0xf00f, BIOS settings: hdc:pio, >hdd:pio >kernel: 2.6.15.6 #1 PREEMPT Sat Mar 11 00:56:41 CET 2006 i686 GNU/Linux > >ext3 was chosen in the hope to make the system more power-failure >resilient. The system run on a UPS, but unfortunately some operators >will just pull the power plug (allthought they're instucted not to). > >What we have experienced now multiple times is, that the systems run just >fine, absolutely no complaints in dmesg/kern.log, until it is rebooted >(shutdown -r now). At that point, *very rarely* GRUB will no longer be >able to read the boot filesystem (Error 17). > >I've checked the on-disk data and have discovered that 0x200-0x1c00 is >overwritten with 0xff, then a single 0x0f and after that 0x00 untill >0x207f > >That is the second to the sixteenth on-disk blocks have been overwritten: > >000001e0 53 59 53 4d 53 44 4f 53 20 20 20 53 59 53 7f 01 |SYSMSDOS >SYS..| >000001f0 00 41 bb 00 07 60 66 6a 00 e9 3b ff 00 00 00 00 >|.A»..`fj.é;ÿ....| >00000200 ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff ff >|ÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿÿ| >* >00001c00 ff 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >|ÿ...............| >00001c10 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 >|................| >* >00002080 ed 41 00 00 00 04 00 00 1e 39 a0 46 a6 6a dd 45 |íA.......9 >F¦jÝE| > >Our project does no hardware-level operations. All access is through >regular file-operations only. Thus there's no way we're aware of that >our software would be changing blocks on-disk directly. > >What's striking about the problem above is that the first affected block >starts _before_ the on-disk filesystem (0x200), which starts at 0x400. > >My question is: does the ext3 driver _ever_ write outside of its own >space on disk - i.e into 0x000-0x400? That is can we exclude with >certainity that it's _not_ the ext3 driver causing the problem? > >What else could cause the problem then? We don't see any sign of a >problem before reboot only after. Could the IDE driver be the problem? >Or is it the IDE CF Card HW? > >I've done a dd=/dev/hdc of=/dev/null and there was absolutely no trouble >visible (nothing in kern.log/dmesg), thus the card does not seem to be >broken on the physical level and doesn't have badblocks that would fail >on read. > >Does this ring a bell with anybody? >*t > >_______________________________________________ >Ext3-users mailing list >Ext3-users@xxxxxxxxxx >https://www.redhat.com/mailman/listinfo/ext3-users > _______________________________________________ Ext3-users mailing list Ext3-users@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/ext3-users