Re: XFS - issues with writes using sync

Amit Sahrawat <amit.sahrawat83@xxxxxxxxx> · Thu, 20 Jan 2011 12:08:21 +0530

Hi,
I tried with the same test case - just removed 'sync' command from the script(please check script below) to check the behaviour.

Please find the logs:
#> ./createsetup.sh 
------------[ cut here ]------------
WARNING: at lib/list_debug.c:30 __list_add+0x6c/0x90()
list_add corruption. prev->next should be next (c78699c0), but was c78299c0. (prev=c78699c0).

Modules linked in:
Backtrace: 
[<c04486ac>] (dump_backtrace+0x0/0x110) from [<c06ee0e0>] (dump_stack+0x18/0x1c)
 r6:c07bc5e1 r5:0000001e r4:c7c01d38 r3:00000000
[<c06ee0c8>] (dump_stack+0x0/0x1c) from [<c046b5fc>] (warn_slowpath_common+0x54/0x6c)

[<c046b5a8>] (warn_slowpath_common+0x0/0x6c) from [<c046b6b8>] (warn_slowpath_fmt+0x38/0x40)
 r8:00000000 r7:c05cabd8 r6:c7c01d80 r5:c78699c0 r4:c78699c0
r3:00000009
[<c046b680>] (warn_slowpath_fmt+0x0/0x40) from [<c060b89c>] (__list_add+0x6c/0x90)

 r3:c78699c0 r2:c07bc6b3
[<c060b830>] (__list_add+0x0/0x90) from [<c06f0710>] (__down_write_nested+0xbc/0x10c)
 r6:c78699bc r5:60000013 r4:c302d820
[<c06f0654>] (__down_write_nested+0x0/0x10c) from [<c06f0774>] (__down_write+0x14/0x18)

 r6:c31324a0 r5:00000005 r4:c78699bc
[<c06f0760>] (__down_write+0x0/0x18) from [<c06efd50>] (down_write+0x28/0x30)
[<c06efd28>] (down_write+0x0/0x30) from [<c05a21ac>] (xfs_ilock+0x28/0xe8)

 r4:c7869940 r3:00000000
[<c05a2184>] (xfs_ilock+0x0/0xe8) from [<c05cabd8>] (xfs_file_aio_write+0x1d4/0x8cc)
 r7:00000001 r6:c31324a0 r5:00000001 r4:c7869940
[<c05caa04>] (xfs_file_aio_write+0x0/0x8cc) from [<c04eb404>] (do_sync_write+0xa0/0xe0)

[<c04eb364>] (do_sync_write+0x0/0xe0) from [<c04ec028>] (vfs_write+0xbc/0x178)
 r6:bee925a0 r5:c31324a0 r4:00000f38
[<c04ebf6c>] (vfs_write+0x0/0x178) from [<c04ec1ac>] (sys_write+0x44/0x70)

 r7:00000004 r6:00000f38 r5:bee925a0 r4:c31324a0
[<c04ec168>] (sys_write+0x0/0x70) from [<c04449a0>] (ret_fast_syscall+0x0/0x30)
 r9:c7c00000 r8:c0444b48 r6:bee925a0 r5:00000f38 r4:001854e0
---[ end trace 8124d49a241e0763 ]---

INFO: task cp:5445 blocked for more than 120 seconds.
"echo 0 > /proc/sys/kernel/hung_task_timeout_secs" disables this message.
cp            D c06ee75c     0  5445   2173 0x00000000
Backtrace: 
[<c06ee398>] (schedule+0x0/0x454) from [<c06f0748>] (__down_write_nested+0xf4/0x10c)

 r9:c7869a60 r8:00000000 r7:c05cabd8 r6:c78699bc r5:60000013
r4:c302d820
[<c06f0654>] (__down_write_nested+0x0/0x10c) from [<c06f0774>] (__down_write+0x14/0x18)
 r6:c31324a0 r5:00000005 r4:c78699bc

[<c06f0760>] (__down_write+0x0/0x18) from [<c06efd50>] (down_write+0x28/0x30)
[<c06efd28>] (down_write+0x0/0x30) from [<c05a21ac>] (xfs_ilock+0x28/0xe8)
 r4:c7869940 r3:00000000
[<c05a2184>] (xfs_ilock+0x0/0xe8) from [<c05cabd8>] (xfs_file_aio_write+0x1d4/0x8cc)

 r7:00000001 r6:c31324a0 r5:00000001 r4:c7869940
[<c05caa04>] (xfs_file_aio_write+0x0/0x8cc) from [<c04eb404>] (do_sync_write+0xa0/0xe0)
[<c04eb364>] (do_sync_write+0x0/0xe0) from [<c04ec028>] (vfs_write+0xbc/0x178)

 r6:bee925a0 r5:c31324a0 r4:00000f38
[<c04ebf6c>] (vfs_write+0x0/0x178) from [<c04ec1ac>] (sys_write+0x44/0x70)
 r7:00000004 r6:00000f38 r5:bee925a0 r4:c31324a0
[<c04ec168>] (sys_write+0x0/0x70) from [<c04449a0>] (ret_fast_syscall+0x0/0x30)

 r9:c7c00000 r8:c0444b48 r6:bee925a0 r5:00000f38 r4:001854e0 
^C^Z[1] + Stopped                    ./createsetup.sh

I really doubt about the stability of 2.6.35.9, it is not passing our basic tests. Checking few more things, before deciding about the patches which got introduced between 2.6.34 ~ 2.6.35.9(around 102).

Thanks,
Amit Sahrawat

On Thu, Jan 20, 2011 at 11:37 AM, Amit Sahrawat <amit.sahrawat83@xxxxxxxxx> wrote:

Hi,

I will try to find out the cause for this.
Meanwhile, just a small request/suggestion - in the past this type of testcases have helped us in finding many problems in XFS. 
Can something like this be added to xfstests? This might help. 

Thanks,
Amit Sahrawat

On Thu, Jan 20, 2011 at 10:47 AM, Dave Chinner <david@xxxxxxxxxxxxx> wrote:

On Thu, Jan 20, 2011 at 10:34:30AM +0530, Amit Sahrawat wrote:
> Hi,
>
> I am facing issues in XFS for a simple test case.
> *Target:* ARM
> *Kernel version:* 2.6.35.9
>
> *Test case:*

> mkfs.xfs -f /dev/sda2
> mount -t xfs /dev/sda2 /mnt/usb/sda2
> (Run script - trying to fragment the XFS formatted partition)
> #!/bin/sh
> index=0
> while [ "$?" == 0 ]
> do

> index=$((index+1))
> sync
> cp /mnt/usb/sda1/setupfile /mnt/usb/sda2/setupfile.$index
> done
>
> Partition Size on which files are being created - 1GB(I need to fragment
> this first to run other cases)

> Size of *'setupfile'*  - 16K
>
> There used be no such issues till *2.6.34*(last XFS version where we tried
> to create setup). There is no reset involved this time, just simple running
> the script caused this issue.

You have a known good version, a known bad version and a
reproducable test case. i.e. everything you need to run a git bisect
and find the commit introduced the regression. Can you do this and
tell us what that commit is?

Cheers,

Dave.
--
Dave Chinner
david@xxxxxxxxxxxxx

_______________________________________________
xfs mailing list
xfs@xxxxxxxxxxx
http://oss.sgi.com/mailman/listinfo/xfs