Re: Problems with 2.6.8.1, loop-AES and ext3

David Gümbel <david.guembel@xxxxxx> · Mon, 23 Aug 2004 18:54:54 +0200

On Sunday 22 August 2004 18:24, Jari Ruusu wrote:
> David Gümbel wrote:
> > > Aug 21 15:30:09 [kernel] EXT3-fs error (device loop0): ext3_readdir:
> > > directory # 802557 contains a hole at offset 348160
> > > Aug 21 15:30:09 [kernel] ext3_abort called.
> > > Aug 21 15:30:09 [kernel] EXT3-fs error (device loop0) in
> > > start_transaction: Jour
> > > nal has aborted
> > >                 - Last output repeated 3 times -
> >
> > So the problem doesn't seem to be related to register parameters.
> >
> > Any ideas?
>
> Even though I have been unable to reproduce these errors, I believe I
> found the cause of these errors.
>
> Kernel 2.6.7 to 2.6.8.1 change involved block driver API change which is
> rather nasty thing to do in "stable" kernel series.
>
> When file system code (or in this case loop code) sends I/O requests to
> block driver, it fills up a struct bio that tells driver what it should
> do. In this case two new entries were added to struct bio, and 2.6 loop
> code didn't initialize those value to sane default values that I/O
> elevator code was expecting. IOW, elevator code was accessing a structure
> that had two uninitialized variables.
>
> Does the patch below fix the problem? David, if you ack that this patch
> fixes it for you, then I will have to make new version of loop-AES ASAP.

I tried to apply the patch using patch, which didn't work (as far as I can 
see due to a messing up of TAB-vs.-Space, so I typed in the changes 
manually. A diff of the new vs. the old file reads like this (just in case 
I made any mistakes ;):

marsupilami loop-AES-v2.1c # diff -u loop.c-2.6.patched.bak 
loop.c-2.6.patched-by-david

--- loop.c-2.6.patched.bak      2004-08-23 10:46:54.000000000 +0200
+++ loop.c-2.6.patched-by-david 2004-08-23 16:06:03.000000000 +0200
@@ -508,6 +508,10 @@
        bio->bi_phys_segments = 0;
        bio->bi_hw_segments = 0;
        bio->bi_size = len = orig_bio->bi_io_vec[merge->bi_idx].bv_len;
+#if defined(BIOVEC_VIRT_START_SIZE)
+       bio->bi_hw_front_size = 0;
+       bio->bi_hw_back_size = 0;
+#endif
        /* bio->bi_max_vecs not touched */
        bio->bi_io_vec[0].bv_len = len;
        bio->bi_io_vec[0].bv_offset = 0;


Anyway, I am getting the old error again, pretty soon after bootup and 
starting a compile to get some load on the system (on /usr and /var, both 
unencrypted ext3. loop0 is encrypted, device backed /home using ext3):

Aug 23 18:33:08 [kernel] EXT3-fs error (device loop0): ext3_readdir: 
directory #783444 contains a hole at offset 4096
Aug 23 18:33:09 [kernel] ext3_abort called.
Aug 23 18:35:18 [kernel] EXT3-fs error (device loop0) in start_transaction: 
Journal has aborted
                - Last output repeated 22 times -
Aug 23 18:35:34 [login(pam_unix)] session closed for user root
Aug 23 18:35:35 [kernel] agpgart: Found an AGP 2.0 compliant device at 
0000:00:00.0.
Aug 23 18:35:40 [kernel] EXT3-fs error (device loop0) in start_transaction: 
Journal has aborted


In addition, I noticed that after a fresh bootup with 2.6.8.1-loop-AES-2.1c 
(fresh = bootup and root console login, [but KDM started]), when trying to 
do "ls -la /tmp", I was getting an error from ls saying it could not 
read /tmp (encrypted reiserfs, mounted fine). echo "test" > /tmp/tryit && 
cat /tmp/tryit worked fine, though, and so were "ls /tmp/tryit" and e.g. 
"file /tmp/tryit".


So, no - unfortunately the patch doesn't fix the problem for me with 
2.6.8.1, loop-AES-2.1c (and mregparm enabled, although that shouldn't 
matter). Kernel bootup parameters were "elevator=cfq" (like in the tests 
before), in case that is of any importance. Anyway, thank you, Jari, for 
your efforts so far. If you have any other ideas, please let me know.


Regards,



David


Attachment:
pgpewdEUkJoj4.pgp

Description: PGP signature