Re: DIO process stuck apparently due to dioread_nolock (3.0)

Tao Ma <tm@xxxxxx> · Mon, 15 Aug 2011 18:28:37 +0800

On 08/15/2011 05:03 PM, Michael Tokarev wrote:
> 15.08.2011 12:56, Michael Tokarev пишет:
>> 15.08.2011 12:00, Michael Tokarev wrote:
>> [....]
>>
>> So, it looks like this (starting with cold cache):
>>
>> 1. rename the redologs and copy them over - this will
>>    make a hot copy of redologs
>> 2. startup oracle - it will complain that the redologs aren't
>>    redologs, the header is corrupt
>> 3. shut down oracle, start it up again - it will succeed.
>>
>> If between 1 and 2 you'll issue sync(1) everything will work.
>> When shutting down, oracle calls fsync(), so that's like
>> sync(1) again.
>>
>> If there will be some time between 1. and 2., everything
>> will work too.
>>
>> Without dioread_nolock I can't trigger the problem no matter
>> how I tried.
>>
>>
>> A smaller test case.  I used redo1.odf file (one of the
>> redologs) as a test file, any will work.
>>
>>  $ cp -p redo1.odf temp
>>  $ dd if=temp of=foo iflag=direct count=20
>>
>> Now, first 512bytes of "foo" will contain all zeros, while
>> the beginning of redo1.odf is _not_ zeros.
>>
>> Again, without aioread_nolock it works as expected.
>>
>>
>> And the most important note: without the patch there's no
>> data corruption like that.  But instead, there is the
>> lockup... ;)
> 
> Actually I can reproduce this data corruption without the
> patch too, just not that easily.  Oracle testcase (with
> copying redologs over) does that nicely.  So that's a
> separate bug which was here before.
cool, thanks for the test.

btw, I can reproduce the bug with
 $ cp -p redo1.odf temp
 $ dd if=temp of=foo iflag=direct count=20
Not that easy, but I did encounter one during my more than 20 tries,
hope I can get something out soon.

Thanks
Tao
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html