Re: [PATCH 3/3] ceph: fix potential races in ceph_uninline_data

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



"Jeff Layton" <jlayton@xxxxxxxxxx> writes:

> On Wed, 2019-07-10 at 19:03 +0100, Luis Henriques wrote:
>> Jeff Layton <jlayton@xxxxxxxxxx> writes:
>> 
>> > The current code will do the uninlining but it relies on the caller to
>> > set the i_inline_version appropriately afterward. There are several
>> > potential races here.
>> > 
>> > Protect against competing uninlining attempts by having the callers
>> > take the i_truncate_mutex and then have them update the version
>> > themselves before dropping it.
>> > 
>> > Other callers can then re-check the i_inline_version after acquiring the
>> > mutex and if it has changed to CEPH_INLINE_NONE, they can just drop it
>> > and do nothing.
>> > 
>> > Finally since we are doing a lockless check first in all cases, just
>> > move that into ceph_uninline_data as well, and have the callers call
>> > it unconditionally.
>> > 
>> > Signed-off-by: Jeff Layton <jlayton@xxxxxxxxxx>
>> > ---
>> >  fs/ceph/addr.c | 33 ++++++++++++++++++++++++---------
>> >  fs/ceph/file.c | 18 ++++++------------
>> >  2 files changed, 30 insertions(+), 21 deletions(-)
>> > 
>> > diff --git a/fs/ceph/addr.c b/fs/ceph/addr.c
>> > index 5f1e2b6577fb..e9700c997d12 100644
>> > --- a/fs/ceph/addr.c
>> > +++ b/fs/ceph/addr.c
>> > @@ -1541,11 +1541,9 @@ static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf)
>> >  
>> >  	ceph_block_sigs(&oldset);
>> >  
>> > -	if (ci->i_inline_version != CEPH_INLINE_NONE) {
>> > -		err = ceph_uninline_data(inode, off == 0 ? page : NULL);
>> > -		if (err < 0)
>> > -			goto out_free;
>> > -	}
>> > +	err = ceph_uninline_data(inode, off == 0 ? page : NULL);
>> > +	if (err < 0)
>> > +		goto out_free;
>> >  
>> >  	if (off + PAGE_SIZE <= size)
>> >  		len = PAGE_SIZE;
>> > @@ -1593,7 +1591,6 @@ static vm_fault_t ceph_page_mkwrite(struct vm_fault *vmf)
>> >  	    ci->i_inline_version != CEPH_INLINE_NONE) {
>> >  		int dirty;
>> >  		spin_lock(&ci->i_ceph_lock);
>> > -		ci->i_inline_version = CEPH_INLINE_NONE;
>> >  		dirty = __ceph_mark_dirty_caps(ci, CEPH_CAP_FILE_WR,
>> >  					       &prealloc_cf);
>> >  		spin_unlock(&ci->i_ceph_lock);
>> > @@ -1656,6 +1653,10 @@ void ceph_fill_inline_data(struct inode *inode, struct page *locked_page,
>> >  	}
>> >  }
>> >  
>> > +/*
>> > + * We borrow the i_truncate_mutex to serialize callers that may be racing to
>> > + * uninline the data.
>> > + */
>> >  int ceph_uninline_data(struct inode *inode, struct page *page)
>> >  {
>> >  	struct ceph_inode_info *ci = ceph_inode(inode);
>> > @@ -1665,15 +1666,23 @@ int ceph_uninline_data(struct inode *inode, struct page *page)
>> >  	int err = 0;
>> >  	bool from_pagecache = false;
>> >  
>> > -	spin_lock(&ci->i_ceph_lock);
>> > -	inline_version = ci->i_inline_version;
>> > -	spin_unlock(&ci->i_ceph_lock);
>> > +	/* Do a lockless check first -- paired with i_ceph_lock for changes */
>> > +	inline_version = READ_ONCE(ci->i_inline_version);
>> >  
>> >  	dout("uninline_data %p %llx.%llx inline_version %llu\n",
>> >  	     inode, ceph_vinop(inode), inline_version);
>> >  
>> >  	if (inline_version == 1 || /* initial version, no data */
>> >  	    inline_version == CEPH_INLINE_NONE)
>> > +		return 0;
>> 
>> We may need to do the unlock_page(page) before returning.
>> 
>
> No, we shouldn't here, because of patch #2.

Doh!  Right, sorry.

Regarding your earlier inquire (on IRC) about copy_file_range handling
of inline data: a quick look shows that it does not really handle it.  I
guess that the right thing to do is to simply return -EOPNOTSUPP if
either src or dst inodes have i_inline_version = CEPH_INLINE_NONE.

In practice, this may happen already because due to:

	if (len < src_ci->i_layout.object_size)
        	return -EOPNOTSUPP;

but I couldn't dig all the details on how big the inlined data can be.
I could see client_max_inline_size being mentioned in the docs, but at a
first glance this isn't enforced by the MDS but only validated by the
fuse client (I could be wrong, as I just did a quick grep in the code).

Anyway, I'll soon send a patch to do an explicit check for inline data
in the cfr implementation.

Cheers,
-- 
Luis

>
>> > + + mutex_lock(&ci->i_truncate_mutex); + + /* Double check the version
>> > after taking mutex */ + spin_lock(&ci->i_ceph_lock); + inline_version
>> > = ci->i_inline_version; + spin_unlock(&ci->i_ceph_lock); + if
>> > (inline_version == CEPH_INLINE_NONE) goto out;
>> >  
>> >  	if (page) {
>> > @@ -1770,11 +1779,17 @@ int ceph_uninline_data(struct inode *inode, struct page *page)
>> >  	err = ceph_osdc_start_request(&fsc->client->osdc, req, false);
>> >  	if (!err)
>> >  		err = ceph_osdc_wait_request(&fsc->client->osdc, req);
>> > +	if (!err) {
>> > +		spin_lock(&ci->i_ceph_lock);
>> > +		inline_version = CEPH_INLINE_NONE;
>> 
>> Shouldn't this be ci->i_inline_version = CEPH_INLINE_NONE ?  Or maybe
>> both since the dout() below uses inline_version.
>> 
>
> Yes!




[Index of Archives]     [CEPH Users]     [Ceph Large]     [Ceph Dev]     [Information on CEPH]     [Linux BTRFS]     [Linux USB Devel]     [Video for Linux]     [Linux Audio Users]     [Yosemite News]     [Linux Kernel]     [Linux SCSI]

  Powered by Linux