Re: [PATCH] fix panic in jbd by adding locks

Josef Bacik <jbacik@xxxxxxxxxx> · Thu, 16 Aug 2007 12:37:45 -0400

On Thu, Aug 16, 2007 at 06:08:35PM +0200, Jan Kara wrote:
>   Hello,
> 
> > > > It is possible to panic the box by a race condition that exists in the
> > > > journalling code where we do not take the j_revoke_lock when traversing the
> > > > journal's revoked record list.  This patch has been tested and we haven't seen
> > > > the issue yet, its a rather straightforward and correct (at least I think so :)
> > > > fix.  Thank you,
> > >   In principle, the patch looks fine. The only thing I'm wondering about
> > > is how that panic can happen... Journal write_revoke_records() is called
> > > from journal_commit_transaction() when revoke table for the committing
> > > transaction shouldn't see any further changes. So maybe the patch is
> > > masking a different problem.
> > >   Do you have a way of reproducing the problem? Any stack trace
> > > available?
> > 
> > Reproducing the problem is tricky as theres no sure way to make it happen,
> > basically we run the box with alot of memory pressure while doing alot
> > operations that require journalling.  Here is the backtrace of the panic (note
> > this is on a RHEL4 kernel so 2.6.9, but the same problem exists upstream)
>   OK.
> 
> > <1>Unable to handle kernel paging request at virtual address 60000000002b1110
> > <4>kjournald[1310]: Oops 11012296146944 [1]
> > <4>Modules linked in: ltd(U) vfat fat dm_mod button uhci_hcd shpchp e1000
> > bond1(U) bond0(U) ext3 jbd hfcldd(U) hfcldd_conf(U) sd_mod scsi_mod
> > <4>
> > <4>Pid: 1310, CPU 0, comm:            kjournald
> > <4>psr : 0000121008026018 ifs : 8000000000000c9c ip  : [<a000000200045161>]
> > Tainted: P     
> > <4>ip is at journal_write_revoke_records+0x221/0x4e0 [jbd]
> > <4>unat: 0000000000000000 pfs : 0000000000000c9c rsc : 0000000000000003
> > <4>rnat: 0000000000000000 bsps: 0000000000000000 pr  : 000000000000a681
> > <4>ldrs: 0000000000000000 ccv : 0000000000000000 fpsr: 0009804c8a70033f
> > <4>csd : 0000000000000000 ssd : 0000000000000000
> > <4>b0  : a000000200045270 b6  : a00000020026a240 b7  : a00000010000ee90
> > <4>f6  : 0fffbe38e38e381b23800 f7  : 0ffe9edc3d22c00000000
> > <4>f8  : 1000e86fb000000000000 f9  : 100029000000000000000
> > <4>f10 : 1000aeff71c71b9e6e61a f11 : 1003e0000000000000eff
> > <4>r1  : a000000200234000 r2  : 000000000000048c r3  : e0000002791a7a90
> > <4>r8  : 0000000000000000 r9  : e0000002791a0400 r10 : 0000000000000000
> > <4>r11 : e000000001000000 r12 : e0000002791a7b00 r13 : e0000002791a0000
> > <4>r14 : e00000027b7ee6c0 r15 : e0000002791a7b00 r16 : e000000272d48018
> > <4>r17 : 0000000000000000 r18 : 0000000000000000 r19 : 0009804c8a70033f
> > <4>r20 : 60000000002b1118 r21 : a00000010006ad70 r22 : 0000000000000019
> > <4>r23 : 0000000000000000 r24 : 0000000000000000 r25 : 0000000000000019
> > <4>r26 : 0000000000000000 r27 : 0000000000000000 r28 : 0000000000006a41
> > <4>r29 : 0000000000000000 r30 : 0000000000000000 r31 : e00000027b7ee5a4
> > <4>
> > <4>Call Trace:
> > <4> [<a000000100016da0>] show_stack+0x80/0xa0
> > <4>                                sp=e0000002791a7690 bsp=e0000002791a1170
> > <4> [<a0000001000176b0>] show_regs+0x890/0x8c0
> > <4>                                sp=e0000002791a7860 bsp=e0000002791a1128
> > <4> [<a00000010003e910>] die+0x150/0x240
> > <4>                                sp=e0000002791a7880 bsp=e0000002791a10e8
> > <4> [<a0000001000644c0>] ia64_do_page_fault+0x8c0/0xbc0
> > <4>                                sp=e0000002791a7880 bsp=e0000002791a1080
> > <4> [<a00000010000f600>] ia64_leave_kernel+0x0/0x260
> > <4>                                sp=e0000002791a7930 bsp=e0000002791a1080
> > <4> [<a000000200045160>] journal_write_revoke_records+0x220/0x4e0 [jbd]
> > <4>                                sp=e0000002791a7b00 bsp=e0000002791a0f98
> > <4> [<a00000020003d940>] journal_commit_transaction+0xf80/0x3080 [jbd]
> > <4>                                sp=e0000002791a7b10 bsp=e0000002791a0ea0
> > <4> [<a0000002000458d0>] kjournald+0x170/0x580 [jbd]
> > <4>                                sp=e0000002791a7d80 bsp=e0000002791a0e38
> > <4> [<a000000100018c70>] kernel_thread_helper+0x30/0x60
> > <4>                                sp=e0000002791a7e30 bsp=e0000002791a0e10
> > <4> [<a000000100008c60>] start_kernel_thread+0x20/0x40
> > <4>                                sp=e0000002791a7e30 bsp=e0000002791a0e10
>   Do you know (or could you find out) where exactly in the code is
> journal_write_revoke_records+0x221/0x4e0?
> 

Yeah sorry

500 void journal_write_revoke_records(journal_t *journal, 
 501                                   transaction_t *transaction)
 502 {
 503         struct journal_head *descriptor;
 504         struct jbd_revoke_record_s *record;
 505         struct jbd_revoke_table_s *revoke;
 506         struct list_head *hash_list;
 507         int i, offset, count;
 508 
 509         descriptor = NULL; 
 510         offset = 0;
 511         count = 0;
 512 
 513         /* select revoke table for committing transaction */
 514         revoke = journal->j_revoke == journal->j_revoke_table[0] ?
 515                 journal->j_revoke_table[1] : journal->j_revoke_table[0];
 516 
 517         for (i = 0; i < revoke->hash_size; i++) {
 518                 hash_list = &revoke->hash_table[i];
 519 
 520                 while (!list_empty(hash_list)) {
 521                         record = (struct jbd_revoke_record_s *) 
 522                                 hash_list->next;
 523                         write_one_revoke_record(journal, transaction,
 524                                                 &descriptor, &offset, 
 525                                                 record);
 526                         count++;
 527                         list_del(&record->hash); <-- here

>   Thanks for details. I'm still not convinced. What they essentially
> write is that slab cache revoke_record_cache is not guarded by any spin
> lock. It's not and that should be fine as slab caches are SMP safe by
> themselves.

No its the list_del thats not gaurded, so the hash list gets screwed up outside
of a lock.  If there are other problems that need to be addressed then ok, but I
still think that we should be protecting all of the list traversal/changing
should be protected by the lock.  Thank you,

Josef
-
To unsubscribe from this list: send the line "unsubscribe linux-ext4" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html