Re: [RFC] exofs: New truncate sequence

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



On 05/31/2010 04:44 PM, Nick Piggin wrote:
> On Mon, May 31, 2010 at 03:30:02PM +0300, Boaz Harrosh wrote:
>>
>> These changes are crafted based on the similar
>> conversion done to ext2 by Nick Piggin.
>>
>> Please check me out I bound to have miss-looked
>> something.
>>
>> - Remove the deprecated ->truncate vector, call a new
>>   exofs_setsize for on-disk size update, from exofs_setattr
>> - Call truncate_pagecache on the unused pages if
>>   write_begin/end fails.
>> - Cleanup exofs_delete_inode that did stupid inode
>>   writes and updates on an inode that will be
>>   removed.
>> - And finally get rid of exofs_get_block. We never
>>   had any blocks it was all for calling nobh_truncate_page.
>>   nobh_truncate_page is not actually needed in exofs since
>>   the last page is complete and gone just like all the other
>>   pages. There is no partial blocks in exofs.
>>   [OK do I might need a partial read here upto i_size ???]
>>
>> I've tested with this patch, and there are no apparent
>> failures, so far.
>>
>> These patches are based on Al's vfs for-next branch, which
>> contain Nick's last patches. But they do not contain,
>> (and will lightly conflict with) latest Christoph's
>> patches. Christoph do you have a tree I can rebase on? or
>> maybe you want to take this patch into your patchset?
>>
>> CC: Christoph Hellwig <hch@xxxxxx>
>> CC: Nick Piggin <npiggin@xxxxxxx>
>> Signed-off-by: Boaz Harrosh <bharrosh@xxxxxxxxxxx>
>> ---
>>  fs/exofs/exofs.h |    1 -
>>  fs/exofs/file.c  |    1 -
>>  fs/exofs/inode.c |  115 +++++++++++++++++++++++++++---------------------------
> 
> Can you rip out all the rest of the buffer_head stuff too?
> 

I hope I don't have any left, that was the last, have I missed
something?

Thanks for looking. Please talk to me a bit more, the exofs
is a bit special around here and I would like another pair
of eyes on this.

> 
>>  3 files changed, 58 insertions(+), 59 deletions(-)
>>
>> diff --git a/fs/exofs/exofs.h b/fs/exofs/exofs.h
>> index 22721b2..0706ce9 100644
>> --- a/fs/exofs/exofs.h
>> +++ b/fs/exofs/exofs.h
>> @@ -256,7 +256,6 @@ static inline int exofs_oi_read(struct exofs_i_info *oi,
>>  }
>>  
>>  /* inode.c               */
>> -void exofs_truncate(struct inode *inode);
>>  int exofs_setattr(struct dentry *, struct iattr *);
>>  int exofs_write_begin(struct file *file, struct address_space *mapping,
>>  		loff_t pos, unsigned len, unsigned flags,
>> diff --git a/fs/exofs/file.c b/fs/exofs/file.c
>> index fef6899..f9bfe2b 100644
>> --- a/fs/exofs/file.c
>> +++ b/fs/exofs/file.c
>> @@ -86,6 +86,5 @@ const struct file_operations exofs_file_operations = {
>>  };
>>  
>>  const struct inode_operations exofs_file_inode_operations = {
>> -	.truncate	= exofs_truncate,
>>  	.setattr	= exofs_setattr,
>>  };
>> diff --git a/fs/exofs/inode.c b/fs/exofs/inode.c
>> index 4bb6ef8..1c2666c 100644
>> --- a/fs/exofs/inode.c
>> +++ b/fs/exofs/inode.c
>> @@ -710,7 +710,7 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
>>  					 fsdata);
>>  		if (ret) {
>>  			EXOFS_DBGMSG("simple_write_begin faild\n");
>> -			return ret;
>> +			goto out;
>>  		}
>>  
>>  		page = *pagep;
>> @@ -725,7 +725,17 @@ int exofs_write_begin(struct file *file, struct address_space *mapping,
>>  			EXOFS_DBGMSG("__readpage_filler faild\n");
>>  		}
>>  	}
>> +out:
>> +	if (unlikely(ret)) {
>> +		struct inode *inode = mapping->host;
>> +		loff_t to = pos + len;
>>  
>> +		if (to > inode->i_size)
>> +			truncate_pagecache(inode, to, inode->i_size);
>> +
>> +		mark_inode_dirty(inode); /* write the new size */
> 
> Is this required? I don't think the i_size should be changed here?

Right and write_begin has marked it dirty anyway

> 
>> +		return ret;
>> +	}
>>  	return ret;
>>  }
>>  
>> @@ -750,6 +760,10 @@ static int exofs_write_end(struct file *file, struct address_space *mapping,
>>  	int ret;
>>  
>>  	ret = simple_write_end(file, mapping,pos, len, copied, page, fsdata);
>> +	if (unlikely(ret && pos + len > inode->i_size))
>> +		truncate_pagecache(inode, pos + len, inode->i_size);
>> +
> 
> So there is no need to do any oi_truncate? Even if _readpage in
> write_begin has set up some blocks?
> 

No blocks are setup. Exofs does not have any blocks. If the write
failed then the object on OSD has the last written offset as objects
size. If error handling was done right residual is subtracted from IO
size and should be reflected here.

If IO was never attempted then object size did not grow and we revert
in memory i_size here. (size is keept in two places and is checked
for consistency in fsck)

TODO:
	eject short writes and see if this works correctly.

> 
>> +	/* TODO: once simple_write_end marks inode dirty remove */
>>  	if (i_size != inode->i_size)
>>  		mark_inode_dirty(inode);
>>  	return ret;
> 
> Hmm, I suppose simple_write_end probably should mark the inode dirty?
> 

I think Christoph has a patch for that.

> 
>> @@ -1335,28 +1339,25 @@ void exofs_delete_inode(struct inode *inode)
>>  
>>  	truncate_inode_pages(&inode->i_data, 0);
>>  
>> +	/* TODO: should do better here */
>>  	if (is_bad_inode(inode))
>>  		goto no_delete;
>>  
>> -	mark_inode_dirty(inode);
>> -	exofs_update_inode(inode, inode_needs_sync(inode));
>> -
>>  	inode->i_size = 0;
>> -	if (inode->i_blocks)
>> -		exofs_truncate(inode);
> 
> This guy has gone missing -- I assume exofs_sbi_remove is a more
> efficient way to do this anyway?
> 

You see this is where exofs is different a file is an object_no on
multiple OSD devices. The inode is kept as an attribute of the
object. (data as object's data) so a exofs_sbi_remove will just
obliterate any association to the object. It was historically
called because exofs_truncate used to do what truncate_inode_pages
does today. (And some other in memory book keeping.) But with
your help all this was cleaned up.

Do you see any operation I missed that might need cleaning from the
generic VFS inode, that might now leak. As far as storage is concerned
I'm covered.

[I ran git clone linux; rm -rf linux; 100 times in a loop and the OSD
 storage stayed constant size. So I presume there is no storage leak.
 OSD is good in this respect]

> 
>> -
>>  	clear_inode(inode);
>>  
>> -	ret = exofs_get_io_state(&sbi->layout, &ios);
>> -	if (unlikely(ret)) {
>> -		EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
>> -		return;
>> -	}
>> -
>>  	/* if we are deleting an obj that hasn't been created yet, wait */
>>  	if (!obj_created(oi)) {
>>  		BUG_ON(!obj_2bcreated(oi));
>>  		wait_event(oi->i_wq, obj_created(oi));
>> +		/* ignore the error attempt a remove anyway */
>> +	}
>> +
>> +	/* Now Remove the OSD objects */
>> +	ret = exofs_get_io_state(&sbi->layout, &ios);
>> +	if (unlikely(ret)) {
>> +		EXOFS_ERR("%s: exofs_get_io_state failed\n", __func__);
>> +		return;
>>  	}
>>  
>>  	ios->obj.id = exofs_oi_objno(oi);
>> -- 
>> 1.6.6.1

Thanks for lookin. And thanks for making this patch possible. I wanted
this cleaned, long ago, but it was only made easy and simple after your
changes.

Boaz
--
To unsubscribe from this list: send the line "unsubscribe linux-fsdevel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html


[Index of Archives]     [Linux Ext4 Filesystem]     [Union Filesystem]     [Filesystem Testing]     [Ceph Users]     [Ecryptfs]     [AutoFS]     [Kernel Newbies]     [Share Photos]     [Security]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux Security]     [Linux Cachefs]     [Reiser Filesystem]     [Linux RAID]     [Samba]     [Device Mapper]     [CEPH Development]
  Powered by Linux