On 12:57, Dmitry Monakhov wrote: > > - run stress -d 5 --hdd-bytes 10G --hdd-noclean until it dies > what 'stress' process do? was it posted already? stress is a simple, yet useful program which imposes certain types of stress on a machine. With the above command line options, it simply writes 5 files in parallel, each 10G large, in an endless loop until the file system is full (or becomes read-only due to errors). It helped me more than once to identify hardware or software problems, _before_ the machine went into production use. > > Summary: Increasing the device timeout to 60s _or_ disabling barriers > > makes the problem go away. Deactivating delayed allocation makes the > > problem worse. > 2Gb cache is really huge. Really? This is a four year old el-cheapo hardware raid system with 16 SATA slots. You can easily spend twice the money and get much more cache memory then. > barriers=0 , result in less disk wcache activity, but more real IO > And nodelaloc result in more real IO due, so imho this is looks like > device issue. Yes, I think we all agree that the problem is not ext4-related but is most likely an issue with the infortrend hardware. However, ext4 seems to be very good at triggering that particular problem. > about nodelalloc: It is unlikely to see "This should not happen!! > Data will be lost" because this message appear from writepage > so may happens only when you rewrite an existing file(below i_size). Nope, this definitely occured while stress was writing new files and the file system was nearly full. > BTW, you already noted that you have performed some stress on the device > without filesystem. What was they doing? I only ran ddrescue /dev/sda /dev/null once to make sure everything is readable. This completed with no problems, so I created an ext4 file system and used the above stress command which resulted in write errors. I then used ddrescue again to rewrite the sector on which the error occured. This also succeeded which indicates a transient problem, i.e. no problem with the particular sector. Regards Andre -- The only person who always got his work done by Friday was Robinson Crusoe
Attachment:
signature.asc
Description: Digital signature