Re: [PATCH] libceph: ceph_pagelist_append might sleep while atomic

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/14/2013 10:44 AM, Alex Elder wrote:
> On 05/09/2013 09:42 AM, Jim Schutt wrote:
>> Ceph's encode_caps_cb() worked hard to not call __page_cache_alloc while
>> holding a lock, but it's spoiled because ceph_pagelist_addpage() always
>> calls kmap(), which might sleep.  Here's the result:
> 
> I finally took a close look at this today, Jim.  Sorry
> for the delay.
> 

No worries - thanks for taking a look.

> The issue is formatting the reconnect message--which will
> hold an arbitrary amount of data and therefore which we'll
> need to do some allocation (and kmap) for--in the face of
> having to hold the flock spinlock while doing so.
> 
> And as you found, ceph_pagelist_addpage(), which is called
> by ceph_pagelist_append(), calls kmap() even if it doesn't
> need to allocate anything.  This means that despite reserving
> the pages, those pages are in the free list and because they'll
> need to be the subject of kmap() their preallocation doesn't
> help.
> 
> Your solution was to pre-allocate a buffer, format the locks
> into that buffer while holding the lock, then append the
> buffer contents to a pagelist after releasing the lock.  You
> check for a changing (increasing) lock count while you format
> the locks, which is good.
> 
> So...  Given that, I think your change looks good.  It's a shame
> we can't format directly into the pagelist buffer but this won't
> happen much so it's not a big deal.  I have a few small suggestions,
> below.
> 
> I do find some byte order bugs though.   They aren't your doing,
> but I think they ought to be fixed first, as a separate patch
> that would precede this one.  The bug is that the lock counts
> that are put into the buffer (num_fcntl_locks and num_flock_locks)
> are not properly byte-swapped.  I'll point it out inline
> in your code, below.
> 
> I'll say that what you have is OK.  Consider my suggestions, and
> if you choose not to fix the byte order bugs, please let me know.

I'll happily fix up a v2 series with your suggestions addressed.
Thanks for catching those issues.  Stay tuned...

Thanks -- Jim


--
To unsubscribe from this list: send the line "unsubscribe ceph-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]
  Powered by Linux