Re: [PATCH] CIFS: Do not kmalloc under the flocks spinlock

Pavel Shilovsky <piastry@xxxxxxxxxxx> · Fri, 24 Feb 2012 15:32:32 +0400

2012/2/23 Jeff Layton <jlayton@xxxxxxxxx>:
> On Thu, 23 Feb 2012 10:56:40 +0400
> Pavel Shilovsky <piastry@xxxxxxxxxxx> wrote:
>
>> 2012/2/22 Jeff Layton <jlayton@xxxxxxxxx>:
>> >
>> > Hmm...looking at my machine:
>> >
>> > (gdb) p sizeof(struct cifsLockInfo)
>> > $1 = 144
>> >
>> > So that's 144 bytes per struct...what's the upper limit on "count"?
>> >
>> > It seems like that could potentially be a large allocation, and we have
>> > to do this in the oplock break codepath.
>> >
>> > Instead of doing this, would it be better to allocate these when
>> > the VFS first calls ->lock, and attach them to a new member of
>> > file_lock.fl_u?
>>
>> Every time we can have a different number of locks - so, can't
>> pre-allocate it.
>
> You could preallocate these:
>
> 1/ Do the allocation in cifs_setlk
> 2/ Attach it to a new fl_u member in the file_lock struct

I am sure I got the idea. Do you mean to allocate the memory in inode->i_flock?

> 3/ Add a fl_release_private op to free that object
>
> Of course, you will need to consider how to deal with lock splitting
> and merging if you do that...

Yes, how can we deal with this sort of things? We can calculate the
lock number after every cifs_setlk but it's rather time consumption.

If we use this struct:

struct lock_to_push {
        __u64 offset;
        __u64 length;
        __u32 pid;
        __u16 netfid;
        __u8 type;
};

it's sizeof is 24 bytes. If we have e.g 10^5 locks we end up with
2400000 bytes = 2.4 megabyte - don't think allocating this plays any
big role in whole time we need to push 10^5 locks to the server.
What's the problem with allocating it here?

>
>> I can create new struct for this only - without extra
>> llist, blist, block_q fields, that can save enough memory.
>>
>
> That sounds preferable. We really don't want to allocate more memory
> than is needed. Plus, anytime you duplicate information at least one
> copy is going to be wrong at some point in time.
>
>> >
>> >>       lock_flocks();
>> >>       cifs_for_each_lock(cfile->dentry->d_inode, before) {
>> >> @@ -950,35 +962,38 @@ cifs_push_posix_locks(struct cifsFileInfo *cfile)
>> >>                       type = CIFS_RDLCK;
>> >>               else
>> >>                       type = CIFS_WRLCK;
>> >> -
>> >> -             lck = cifs_lock_init(flock->fl_start, length, type,
>> >> -                                  cfile->netfid);
>> >> -             if (!lck) {
>> >> -                     rc = -ENOMEM;
>> >> -                     goto send_locks;
>> >> -             }
>> >> +             lck = &lcks[i];
>> >>               lck->pid = flock->fl_pid;
>> >> -
>> >> -             list_add_tail(&lck->llist, &locks_to_send);
>> >> +             lck->netfid = cfile->netfid;
>> >> +             lck->length = length;
>> >> +             lck->type = type;
>> >> +             lck->offset = flock->fl_start;
>> >> +             i++;
>> >> +             if (i == count)
>> >> +                     break;
>> >>       }
>> >> -
>> >> -send_locks:
>> >>       unlock_flocks();
>> >>
>> >> -     list_for_each_entry_safe(lck, tmp, &locks_to_send, llist) {
>> >> +     if (i == count)
>> >> +             cERROR(1, "Can't push all brlocks!");
>> >> +
>> >
>> > This sort of thing always makes me a little worried. Sure, you're
>> > allocating a bunch of extra locks, but what's the potential effect
>> > of not being able to push all the locks here? The user is going to
>> > see this message and have no clue of what it means.
>> >
>> > Even I'm not sure what this means in practice. The VFS will have a
>> > record of this lock, but the server won't...
>> >
>>
>> I really think that there will be no such situations, because we hold
>> inode->lock_mutex - the locking state should leave the same.
>>
>
> I don't share that optimism.
>
> The lock_mutex protects the cinode->llist, but here you're walking the
> inode->i_flock list. It's certainly possible that it could change in
> between unlock_flocks and lock_flocks.
>

The lock_mutex now protects not only cinode->llist, but locks from the
inode structure reached from cifs code as well. The only thing I
missed is posix_lock_file_wait in the end of the cifs_setlk function -
it should be surrounded with lock_mutex as well. According to this,
there is no other way to increase POSIX lock number from the inode
except locking/unlocking lock_mutex.

-- 
Best regards,
Pavel Shilovsky.
--
To unsubscribe from this list: send the line "unsubscribe linux-cifs" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html