Re: [PATCH] CIFS: Do not kmalloc under the flocks spinlock

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On Fri, 24 Feb 2012 15:32:32 +0400
Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:

> 2012/2/23 Jeff Layton <jlayton@xxxxxxxxx>:
> > On Thu, 23 Feb 2012 10:56:40 +0400
> > Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
> >
> >> 2012/2/22 Jeff Layton <jlayton@xxxxxxxxx>:
> >> >
> >> > Hmm...looking at my machine:
> >> >
> >> > (gdb) p sizeof(struct cifsLockInfo)
> >> > $1 = 144
> >> >
> >> > So that's 144 bytes per struct...what's the upper limit on "count"?
> >> >
> >> > It seems like that could potentially be a large allocation, and we have
> >> > to do this in the oplock break codepath.
> >> >
> >> > Instead of doing this, would it be better to allocate these when
> >> > the VFS first calls ->lock, and attach them to a new member of
> >> > file_lock.fl_u?
> >>
> >> Every time we can have a different number of locks - so, can't
> >> pre-allocate it.
> >
> > You could preallocate these:
> >
> > 1/ Do the allocation in cifs_setlk
> > 2/ Attach it to a new fl_u member in the file_lock struct
> 
> I am sure I got the idea. Do you mean to allocate the memory in inode->i_flock?
> 

Well, no...I meant that you would want to add a new file_lock->fl_u
union member and stick the pointer there and do the allocation earlier
in cifs_setlk.

> > 3/ Add a fl_release_private op to free that object
> >
> > Of course, you will need to consider how to deal with lock splitting
> > and merging if you do that...
> 
> Yes, how can we deal with this sort of things? We can calculate the
> lock number after every cifs_setlk but it's rather time consumption.
> 
> If we use this struct:
> 
> struct lock_to_push {
>         __u64 offset;
>         __u64 length;
>         __u32 pid;
>         __u16 netfid;
>         __u8 type;
> };
> 
> it's sizeof is 24 bytes. If we have e.g 10^5 locks we end up with
> 2400000 bytes = 2.4 megabyte - don't think allocating this plays any
> big role in whole time we need to push 10^5 locks to the server.
> What's the problem with allocating it here?
> 

2.4 M would be an enormous allocation in the kernel and would almost
certainly fail. :)

In order to do a large allocation in the kernel via kmem_cache_alloc,
you need contiguous physical memory. Over time, memory gets fragmented.
The page allocator may need to do things like initiate writeback or
migrate pages in order to satisfy your request. That can be very slow
and creates extra load on the box.

It's best to avoid that if possible, especially in codepaths like this
where you're doing this in workqueue context and reporting an error
back to the user is pretty much impossible.

If you do these allocations up front, when you set the lock the caller
can get back an ENOMEM or whatever on the fcntl call. At this point,
recovering from an allocation error will always be problematic.

> >
> >> I can create new struct for this only - without extra
> >> llist, blist, block_q fields, that can save enough memory.
> >>
> >
> > That sounds preferable. We really don't want to allocate more memory
> > than is needed. Plus, anytime you duplicate information at least one
> > copy is going to be wrong at some point in time.
> >
> >> >
> >> >>       lock_flocks();
> >> >>       cifs_for_each_lock(cfile->dentry->d_inode, before) {
> >> >> @@ -950,35 +962,38 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
> >> >>                       type = CIFS_RDLCK;
> >> >>               else
> >> >>                       type = CIFS_WRLCK;
> >> >> -
> >> >> -             lck = cifs_lock_init(flock->fl_start, length, type,
> >> >> -                                  cfile->netfid);
> >> >> -             if (!lck) {
> >> >> -                     rc = -ENOMEM;
> >> >> -                     goto send_locks;
> >> >> -             }
> >> >> +             lck = &lcks[i];
> >> >>               lck->pid = flock->fl_pid;
> >> >> -
> >> >> -             list_add_tail(&lck->llist, &locks_to_send);
> >> >> +             lck->netfid = cfile->netfid;
> >> >> +             lck->length = length;
> >> >> +             lck->type = type;
> >> >> +             lck->offset = flock->fl_start;
> >> >> +             i++;
> >> >> +             if (i == count)
> >> >> +                     break;
> >> >>       }
> >> >> -
> >> >> -send_locks:
> >> >>       unlock_flocks();
> >> >>
> >> >> -     list_for_each_entry_safe(lck, tmp, &locks_to_send, llist) {
> >> >> +     if (i == count)
> >> >> +             cERROR(1, "Can't push all brlocks!");
> >> >> +
> >> >
> >> > This sort of thing always makes me a little worried. Sure, you're
> >> > allocating a bunch of extra locks, but what's the potential effect
> >> > of not being able to push all the locks here? The user is going to
> >> > see this message and have no clue of what it means.
> >> >
> >> > Even I'm not sure what this means in practice. The VFS will have a
> >> > record of this lock, but the server won't...
> >> >
> >>
> >> I really think that there will be no such situations, because we hold
> >> inode->lock_mutex - the locking state should leave the same.
> >>
> >
> > I don't share that optimism.
> >
> > The lock_mutex protects the cinode->llist, but here you're walking the
> > inode->i_flock list. It's certainly possible that it could change in
> > between unlock_flocks and lock_flocks.
> >
> 
> The lock_mutex now protects not only cinode->llist, but locks from the
> inode structure reached from cifs code as well. The only thing I
> missed is posix_lock_file_wait in the end of the cifs_setlk function -
> it should be surrounded with lock_mutex as well. According to this,
> there is no other way to increase POSIX lock number from the inode
> except locking/unlocking lock_mutex.
> 

Ok, that will probably cover it this for now. Here's what I'd suggest:

1/ that locking needs to be documented, or were almost certain to break
it later.

2/ don't allocate more structs than you need. If you know that the list
won't change while you're holding the lock_mutex then there's no reason
to do so.

That said...this whole scheme seems quite fragile. I wonder if there's
a better way to do this that doesn't require quite so much to happen at
oplock break time...

-- 
Jeff Layton <jlayton@xxxxxxxxx>
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux