Re: [RFC PATCH 1/4] ext4: DIO get_block code cleanup

Jiaying Zhang <jiayingz@xxxxxxxxxx> · Sun, 27 Dec 2009 01:24:07 -0800

Hi Ted,

Thanks a lot for the feedback on Christmas and happy holiday!

On Fri, Dec 25, 2009 at 11:03 PM,  <tytso@xxxxxxx> wrote:
> On Tue, Dec 15, 2009 at 05:37:53PM -0800, Jiaying Zhang wrote:
>> ext4: dio get_block code cleanup in prepare for it to be used by buffer write
>>
>> Renaming the dio block allocation flags, variables and functions
>> introduced in Mingming's "Direct IO for holes and fallocate"
>> patches so that they can be used by ext4 buffer write as well.
>>
>> Signed-off-by: Jiaying Zhang <jiayingz@xxxxxxxxxx>
>
> Hi Jiaying,
>
> Merry Christmas!
>
> Sorry, between trying to prepare patches for the 2.6.33 merge window,
> and closing out my current position at the Linux Foundation, and the
> resulting job transition, this has fallen through the cracks.  I've
> started looking at your patch now.   A couple of comments so far.
>
> 1) The patches were white-space damaged, so applying them is a bit
> difficult.  I gave up entirely on patch #4, which hopefully is a
> mechnical only patch, but it's also one which is apparently only a
> cleanup.

Really sorry about this. I guess my email client turned tab into spaces.
The fourth patch is only code moving around. As you said, you can easily
regenerate it later so I think we can just leave it out during the review.

>
> Patch #1 had a combination of mechanical changes (i.e., naming of
> variables and constants) as well as some non-mechnical changes.
> Especially in the case of patch #4, when large chunks of code are
> being moved around, the patch which mechnical changes is very likely
> to break if other patches are applied (or reverted) compared to
> whatever version the patch was based off of.  If the mechnical changes
> are separated out (and clearly described in the commits), then I can
> easily regenerate the mechnical changes, and not worry about
> accidentally dropping changes.

I intended to have the first patch only include the mechanical changes,
but left some other changes in it after code rebase. I will move those
changes to the second patch in the next version.

>
> 2) In the first patch, in ext4_sync_file() in fs/ext4/fsync.c, the
> call to flush_aio_dio_completed_IO() is renamed to
> flush_completed_IO().  But then in the second patch, a second call to
> flush_completed_IO() is added at the end of ext4_sync_file().  That
> doesn't look right to me.

The reason I added another flush_completed_IO() at the end of
ext4_sync_file is to finish uninitialized-to-initialized extent conversion
for any buffer writes that may be generated during sync_mapping_buffers.
The completed_io_list may not be empty at this time because
ext4_end_io_buffer_write only queues io_end work that is processed
by ext4_end_io_work later. Is there any problem that you think this
change may introduce?

>
> I need to look at the code more closely, which I will do later.  In
> the mean time, I've added the first three patches which you sent to
> the unstable portion of the ext4 patch queue.  When run against the
> xfsqa regression test suite, it quickly shows a failure:
>
> BUG: unable to handle kernel NULL pointer dereference at 00000004
> IP: [<c0262b58>] __journal_remove_journal_head+0xf/0xcf
> *pdpt = 0000000019b91001 *pde = 0000000000000000
> Oops: 0000 [#1] SMP DEBUG_PAGEALLOC
> last sysfs file:
> Modules linked in:
>
> Pid: 6536, comm: jbd2/sdb-8 Not tainted 2.6.32-06814-g26716e4 #225 /
> EIP: 0060:[<c0262b58>] EFLAGS: 00010246 CPU: 0
> EIP is at __journal_remove_journal_head+0xf/0xcf
> EAX: de570e40 EBX: 00000000 ECX: 00000000 EDX: de570e40
> ESI: de570e40 EDI: 00000002 EBP: d8a94e5c ESP: d8a94e54
>  DS: 007b ES: 007b FS: 00d8 GS: 0000 SS: 0068
> Process jbd2/sdb-8 (pid: 6536, ti=d8a94000 task=df3bbf90 task.ti=d8a94000)
> Stack:
>  de570e40 de570e40 d8a94e68 c0262c87 d4b7c140 d8a94e8c c025fe4b 00000001
> <0> d8a94e9c d4b7c140 d8f630c0 00000000 cf6ba540 d8a94e9c d8a94eac c025fed9
> <0> dbc50d80 dbc50d80 00000000 dc754800 00000000 dc754800 d8a94f5c c025e20a
> Call Trace:
>  [<c0262c87>] ? jbd2_journal_remove_journal_head+0x1e/0x2d
>  [<c025fe4b>] ? journal_clean_one_cp_list+0x68/0xbd
>  [<c025fed9>] ? __jbd2_journal_clean_checkpoint_list+0x39/0x7b
>  [<c025e20a>] ? jbd2_journal_commit_transaction+0x2c5/0x1173
>  [<c017bb17>] ? trace_hardirqs_off+0xb/0xd
>  [<c01631ec>] ? try_to_del_timer_sync+0x5e/0x66
>  [<c062eae5>] ? _spin_unlock_irqrestore+0x41/0x4d
>  [<c017d019>] ? trace_hardirqs_on+0xb/0xd
>  [<c01631ec>] ? try_to_del_timer_sync+0x5e/0x66
>  [<c01635bb>] ? del_timer_sync+0x78/0x88
>  [<c0263b66>] ? kjournald2+0x116/0x309
>  [<c016d9b0>] ? autoremove_wake_function+0x0/0x34
>  [<c0263a50>] ? kjournald2+0x0/0x309
>  [<c016d785>] ? kthread+0x64/0x69
>  [<c016d721>] ? kthread+0x0/0x69
>  [<c012567b>] ? kernel_thread_helper+0x7/0x10
> Code: f6 ff eb 15 89 d8 e8 97 91 f7 ff eb 0c e8 71 ff ff ff 89 da e8 8c 3e f8 ff 5b 5d c3 55 89 e5 56 53 0f 1f 44 00 00 8b 58 24 89 c6 <83> 7b 04 00 79 04 0f 0b eb fe f0 ff 40 34 83 7b 04 00 0f 85 a1
> EIP: [<c0262b58>] __journal_remove_journal_head+0xf/0xcf SS:ESP 0068:d8a94e54
> CR2: 0000000000000004
>

>From the stack trace, the problem seems to be caused by
the use of bh->b_private to hold end_io in my patch. I noticed
that b_private is used by jdb2 to hold journal handle so the
introduced change is disabled if the ext4 is mounted with
data=journal. As I understand, with data=ordered, any buffer
heads held in the journal should be metadata. But I guess either
this is not sure or there is some case where I also changed
b_private field for metadata write.

Jiaying

> Removing your three patches makes the failure go away.  I'll analyze
> this some more later this weekend.
>
> Best regards,
>
>                                                        - Ted
>
--
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html