Re: [PATCH] pnfsblock: Lookup list entry of layouts and tags in reverse order

Zhang Jingwang <yyalone@xxxxxxxxx> · Tue, 18 May 2010 01:22:52 +0800

I've sent two patches to solve this problem, you can try them.

[PATCH] pnfs: set pnfs_curr_ld before calling initialize_mountpoint
[PATCH] pnfs: set pnfs_blksize before calling set_pnfs_layoutdriver

2010/5/18 J. Bruce Fields <bfields@xxxxxxxxxxxx>:
> On Mon, May 17, 2010 at 10:53:11AM -0400, J. Bruce Fields wrote:
>> On Mon, May 17, 2010 at 05:24:39PM +0300, Boaz Harrosh wrote:
>> > On 05/17/2010 04:53 PM, J. Bruce Fields wrote:
>> > > On Wed, May 12, 2010 at 04:28:12PM -0400, bfields wrote:
>> > >> On Wed, May 12, 2010 at 09:46:43AM +0300, Benny Halevy wrote:
>> > >>> On May. 10, 2010, 6:36 +0300, Zhang Jingwang <zhangjingwang@xxxxxxxxxxxx> wrote:
>> > >>>> Optimize for sequencial write. Layout infos and tags are organized by
>> > >>>> file offset. When appending data to a file whole list will be examined,
>> > >>>> which introduce notable performance decrease.
>> > >>>
>> > >>> Looks good to me.
>> > >>>
>> > >>> Fred, can you please double check?
>> > >>
>> > >> I don't know if Fred's still up for reviewing block stuff?
>> > >>
>> > >> I've been trying to keep up with at least some minimal testing, but not
>> > >> as well as I'd like.
>> > >>
>> > >> The one thing I've noticed is that the connectathon general test has
>> > >> started failing right at the start with an IO error.  The last good
>> > >> version I tested was b5c09c21, which was based on 33-rc6.  The earliest
>> > >> bad version I tested was 419312ada, based on 34-rc2.  A quick look at
>> > >> network traces from the two traces didn't turn up anything obvious.  I
>> > >> haven't had the chance yet to look closer.
>> > >
>> > > As of the latest (6666f47d), in my tests the client is falling back on
>> > > IO to the MDS and doing no block IO at all.  b5c09c21 still works, so
>> > > the problem isn't due to a change in the server I'm testing against.  I
>> > > haven't investigated any more closely.
>> > >
>> >
>> > You might be hitting the .commit bug, no? Still no fix. I'm using a work
>> > around for objects. I'm not sure how it affects blocks. I think you should
>> > see that the very first IO goes through layout driver then the IO is redone
>> > through MDS, for each node. Even though write/read returned success because
>> > commit returns NOT_ATTEMPTED. But I might be totally off.
>>
>> I don't believe it's even attempting a GETLAYOUT.
>>
>> I'll take a look at the network....--b.
>
> Everything on the network looks fine, the server's doing the right
> stuff, the client just never asks for a layout.
>
> In fact, blk_initialize_mountpont is failing on the very first check:
>
>        if (server->pnfs_blksize == 0) {
>                dprintk("%s Server did not return blksize\n", __func__);
>                ...
>
> After rearranging the caller:
>
> @@ -880,9 +880,9 @@ static void nfs4_init_pnfs(struct nfs_server *server, struct nfs_fh *mntfh, stru
>
>        if (nfs4_has_session(clp) &&
>            (clp->cl_exchange_flags & EXCHGID4_FLAG_USE_PNFS_MDS)) {
> -               set_pnfs_layoutdriver(server, mntfh, fsinfo->layouttype);
>                pnfs_set_ds_iosize(server);
>                server->pnfs_blksize = fsinfo->blksize;
> +               set_pnfs_layoutdriver(server, mntfh, fsinfo->layouttype);
>        }
>  #endif /* CONFIG_NFS_V4_1 */
>  }
>
> it just fails a little later (see below).  I haven't tried to go any
> farther yet.
>
> (But: why are the layout drivers using this odd pnfs_client_operations
> indirection to call back to the common pnfs code?  As far as I can tell
> there's only one definition of the pnfs_client_operations, so we should
> just remove that structure and call pnfs_getdevicelist, etc., by name.)
>
> --b.
>
> May 17 16:36:14 pearlet4 kernel: BUG: unable to handle kernel NULL pointer dereference at (null)
> May 17 16:36:14 pearlet4 kernel: IP: [<ffffffff8122bc36>] _nfs4_pnfs_getdevicelist+0x26/0x110
> May 17 16:36:14 pearlet4 kernel: PGD 6e11067 PUD 6e12067 PMD 0
> May 17 16:36:14 pearlet4 kernel: Oops: 0000 [#1] PREEMPT
> May 17 16:36:14 pearlet4 kernel: last sysfs file: /sys/kernel/uevent_seqnum
> May 17 16:36:14 pearlet4 kernel: CPU 0
> May 17 16:36:14 pearlet4 kernel: Modules linked in: iscsi_tcp libiscsi_tcp libiscsi scsi_transport_iscsi
> May 17 16:36:14 pearlet4 kernel:
> May 17 16:36:14 pearlet4 kernel: Pid: 2794, comm: mount.nfs4 Not tainted 2.6.34-rc6-pnfs-00314-ga35e9c3 #136 /
> May 17 16:36:14 pearlet4 kernel: RIP: 0010:[<ffffffff8122bc36>]  [<ffffffff8122bc36>] _nfs4_pnfs_getdevicelist+0x26/0x110
> May 17 16:36:14 pearlet4 kernel: RSP: 0018:ffff880004e99538  EFLAGS: 00010246
> May 17 16:36:14 pearlet4 kernel: RAX: 0000000000000000 RBX: ffff880005fff378 RCX: ffff880004e99548
> May 17 16:36:14 pearlet4 kernel: RDX: ffff880004ca24c8 RSI: ffff880004e99a28 RDI: ffff880005fff378
> May 17 16:36:14 pearlet4 kernel: RBP: ffff880004e995c8 R08: 0000000000000000 R09: ffff880004ca24c8
> May 17 16:36:14 pearlet4 kernel: R10: 0000000000000000 R11: 0000000000000001 R12: ffff880004ca24c8
> May 17 16:36:14 pearlet4 kernel: R13: ffff880004ca24c8 R14: ffff880004e995d8 R15: ffff880004e99a28
> May 17 16:36:14 pearlet4 kernel: FS:  00007fed29c476f0(0000) GS:ffffffff81e1c000(0000) knlGS:0000000000000000
> May 17 16:36:14 pearlet4 kernel: CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
> May 17 16:36:14 pearlet4 kernel: CR2: 0000000000000000 CR3: 0000000004e77000 CR4: 00000000000006f0
> May 17 16:36:14 pearlet4 kernel: DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> May 17 16:36:14 pearlet4 kernel: DR3: 0000000000000000 DR6: 00000000ffff0ff0 DR7: 0000000000000400
> May 17 16:36:14 pearlet4 kernel: Process mount.nfs4 (pid: 2794, threadinfo ffff880004e98000, task ffff880004e78040)
> May 17 16:36:14 pearlet4 kernel: Stack:
> May 17 16:36:14 pearlet4 kernel: ffff880004e995c8 ffff880004e995c8 ffff880004e99588 ffffffff8190e5dc
> May 17 16:36:14 pearlet4 kernel: <0> ffff880004e98000 ffff880004e995c8 ffff880004ca24c0 ffff880007800a80
> May 17 16:36:14 pearlet4 kernel: <0> 0000000000000000 ffff880007800a80 ffff880004ca24c0 ffffffff810d46c6
> May 17 16:36:14 pearlet4 kernel: Call Trace:
> May 17 16:36:14 pearlet4 kernel: [<ffffffff8190e5dc>] ? klist_next+0x8c/0xf0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810d46c6>] ? poison_obj+0x36/0x50
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810d4a18>] ? cache_alloc_debugcheck_after+0xe8/0x1f0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff8122c21e>] nfs4_pnfs_getdevicelist+0x4e/0xa0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810d677d>] ? kmem_cache_alloc_notrace+0xfd/0x1a0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81250e81>] bl_initialize_mountpoint+0x161/0x6a0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff812497c9>] set_pnfs_layoutdriver+0x89/0x120
> May 17 16:36:14 pearlet4 kernel: [<ffffffff8120c71f>] nfs_probe_fsinfo+0x54f/0x5f0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff8120d789>] nfs_clone_server+0x129/0x270
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810d46c6>] ? poison_obj+0x36/0x50
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810d4a18>] ? cache_alloc_debugcheck_after+0xe8/0x1f0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810f6db1>] ? alloc_vfsmnt+0xa1/0x180
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810d627d>] ? __kmalloc_track_caller+0x16d/0x2b0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810f6db1>] ? alloc_vfsmnt+0xa1/0x180
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81219fa1>] nfs4_xdev_get_sb+0x61/0x340
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810dd15a>] vfs_kern_mount+0x8a/0x1e0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81224f23>] nfs_follow_mountpoint+0x3b3/0x4b0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810e73b7>] link_path_walk+0xb67/0xd20
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810e76b0>] path_walk+0x60/0xd0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810e778d>] vfs_path_lookup+0x6d/0x90
> May 17 16:36:14 pearlet4 kernel: [<ffffffff8121988d>] nfs_follow_remote_path+0x6d/0x170
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810637fd>] ? trace_hardirqs_on_caller+0x14d/0x190
> May 17 16:36:14 pearlet4 kernel: [<ffffffff812197fb>] ? nfs_do_root_mount+0x8b/0xb0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81219abf>] nfs4_try_mount+0x6f/0xd0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81219bc2>] nfs4_get_sb+0xa2/0x360
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810dd15a>] vfs_kern_mount+0x8a/0x1e0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810dd322>] do_kern_mount+0x52/0x130
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81926cda>] ? _lock_kernel+0x6a/0x16a
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810f788e>] do_mount+0x2de/0x850
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810f585a>] ? copy_mount_options+0xea/0x190
> May 17 16:36:14 pearlet4 kernel: [<ffffffff810f7e98>] sys_mount+0x98/0xf0
> May 17 16:36:14 pearlet4 kernel: [<ffffffff81002518>] system_call_fastpath+0x16/0x1b
> May 17 16:36:14 pearlet4 kernel: Code: 00 00 00 00 00 55 48 89 e5 53 48 81 ec 88 00 00 00 0f 1f 44 00 00 48 8b 87 70 02 00 00 f6 05 75 38 7e 01 10 48 8d 4d 80 48 89 fb <8b> 00 48 89 55 80 48 8d 55 d0 48 c7 45 d8 00 00 00 00 48 c7 45
> May 17 16:36:14 pearlet4 kernel: RIP  [<ffffffff8122bc36>] _nfs4_pnfs_getdevicelist+0x26/0x110
> May 17 16:36:14 pearlet4 kernel: RSP <ffff880004e99538>
> May 17 16:36:14 pearlet4 kernel: CR2: 0000000000000000
> May 17 16:36:14 pearlet4 kernel: ---[ end trace 3956532521eb7ba1 ]---
> May 17 16:36:14 pearlet4 kernel: mount.nfs4 used greatest stack depth: 2104 bytes left
> May 17 16:36:21 pearlet4 kernel: eth0: no IPv6 routers present
> May 17 16:40:32 pearlet4 ntpd[2255]: synchronized to 91.189.94.4, stratum 2
> May 17 16:40:32 pearlet4 ntpd[2255]: kernel time sync status change 2001
> --
> To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
> the body of a message to majordomo@xxxxxxxxxxxxxxx
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
>

-- 
Zhang Jingwang
National Research Centre for High Performance Computers
Institute of Computing Technology, Chinese Academy of Sciences
No. 6, South Kexueyuan Road, Haidian District
Beijing, China
--
To unsubscribe from this list: send the line "unsubscribe linux-nfs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html