Re: Copying Data Blocks

"Sandeep K Sinha" <sandeepksinha@xxxxxxxxx> · Mon, 12 Jan 2009 23:31:29 +0530

Hi Peter,

On Mon, Jan 12, 2009 at 9:49 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
> On Mon, Jan 12, 2009 at 4:26 PM, Sandeep K Sinha
> <sandeepksinha@xxxxxxxxx> wrote:
>> Hi Peter,
>>
>> Don't you think that if will restrict this to a specific file system.
>> VFS inode should be used rather than the FS incore inode ?
>>
>
> vfs have an API:   fsync_buffer_list(), and
> invalidate_inode_buffers(), and these API seemed to used spinlock for
> syncing:
>
> void invalidate_inode_buffers(struct inode *inode)
> {
>        if (inode_has_buffers(inode)) {
>                struct address_space *mapping = &inode->i_data;
>                struct list_head *list = &mapping->private_list;
>                struct address_space *buffer_mapping = mapping->assoc_mapping;
>
>                spin_lock(&buffer_mapping->private_lock);
>                while (!list_empty(list))
>
> __remove_assoc_queue(BH_ENTRY(list->next));======> modify this for
> writing out the data instead.
>                spin_unlock(&buffer_mapping->private_lock);
>        }
> }
> EXPORT_SYMBOL(invalidate_inode_buffers);
>
>
>> The purpose if to sleep all the i/o's when we are updating the i_data
>> from the new inode to the old inode ( updation of the data blocks ).
>>
>> I think i_alloc_sem should work here, but could not find any instance
>> of its use in the code.
>
> for the case of ext3's blcok allocation, the lock seemed to be
> truncate_mutex - read the remark:
>
>        /*
>         * From here we block out all ext3_get_block() callers who want to
>         * modify the block allocation tree.
>         */
>        mutex_lock(&ei->truncate_mutex);
>
> So while it is building the tree, the mutex will lock it.
>
> And the remarks for ext3_get_blocks_handle() are:
>
> /*
>  * Allocation strategy is simple: if we have to allocate something, we will
>  * have to go the whole way to leaf. So let's do it before attaching anything
>  * to tree, set linkage between the newborn blocks, write them if sync is
>  * required, recheck the path, free and repeat if check fails, otherwise
>  * set the last missing link (that will protect us from any truncate-generated
> ...
>
> reading the source....go down and see the mutex_lock() (where
> multiblock allocation are needed) and after the lock, all the blocks
> allocation/merging etc are done:
>
>        /* Next simple case - plain lookup or failed read of indirect block */
>        if (!create || err == -EIO)
>                goto cleanup;
>
>        mutex_lock(&ei->truncate_mutex);
> <snip>
>        count = ext3_blks_to_allocate(partial, indirect_blks,
>                                        maxblocks, blocks_to_boundary);
> <snip>
>        err = ext3_alloc_branch(handle, inode, indirect_blks, &count, goal,
>                                offsets + (partial - chain), partial);
>
>
>> It's working fine currently with i_mutex, meaning if we hold a i_mutex
>
> as far as i know, i_mutex are used for modifying inode's structures information:
>
> grep for i_mutex in fs/ext3/ioctl.c and everytime there is a need to
> maintain inode's structural info, the lock on i_mutex is called.
>
>> lock on the inode while updating the i_data pointers.
>> And try to perform i/o from user space, they are queued. The file was
>> opened in r/w mode prior to taking the lock inside the kernel.
>>
>> But, I still feel i_alloc_sem would be the right option to go ahead with.
>>
>> On Mon, Jan 12, 2009 at 1:11 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>> If u grep for spinlock, mutex, or "sem" in the fs/ext4 directory, u
>>> can find all three types of lock are used - for different class of
>>> object.
>>>
>>> For data blocks I guessed is semaphore - read this
>>> fs/ext4/inode.c:ext4_get_branch():
>>>
>>> /**
>>>  *      ext4_get_branch - read the chain of indirect blocks leading to data
>>> <snip>
>>>  *
>>>  *      Need to be called with
>>>  *      down_read(&EXT4_I(inode)->i_data_sem)
>>>  */
>>>
>>> i guess u have no choice, as it is semaphore, have to follow the rest
>>> of kernel for consistency - don't create your own semaphore :-).
>>>
>>> There exists i_lock as spinlock - which so far i know is for i_blocks
>>> counting purposes:
>>>
>>>       spin_lock(&inode->i_lock);
>>>        inode->i_blocks += tmp_inode->i_blocks;
>>>        spin_unlock(&inode->i_lock);
>>>        up_write(&EXT4_I(inode)->i_data_sem);
>>>
>>> But for data it should be i_data_sem.   Is that correct?
>>>
>>> On Mon, Jan 12, 2009 at 2:18 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>> Hi,
>>>>
>>>> I am having some issues in locking inode while copying data blocks.
>>>> We are trying to keep file system live during this operation, so
>>>> both read and write operations should work.
>>>> In this case what type of lock on inode should be used, semaphore,
>>>> mutex or spinlock?
>>>>
>>>>
>>>> On Sun, Jan 11, 2009 at 8:45 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>> Sorry.....some mistakes...a resent:
>>>>>
>>>>> Here are some tips on the blockdevice API:
>>>>>
>>>>> http://lkml.org/lkml/2006/1/24/287
>>>>> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-01/msg09388.html
>>>>>
>>>>> as indicated, documentation is rather sparse in this area.
>>>>>
>>>>> not sure if anyone else have a summary list of blockdevice API and its
>>>>> explanation?
>>>>>
>>>>> not wrt the following "cleanup patch", i am not sure how the API will change:
>>>>>
>>>>> http://lwn.net/Articles/304485/
>>>>>
>>>>> thanks.
>>>>>
>>>>> On Tue, Jan 6, 2009 at 6:36 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>>>
>>>>>> I want to read data blocks from one inode
>>>>>> and copy it to other inode.
>>>>>>
>>>>>> I mean to copy data from data blocks associated with one inode
>>>>>> to the data blocks associated with other inode.
>>>>>>
>>>>>> Is that possible in kernel space.?
>>>>>> --
>>>
>
> comments ????

Thats very right !!!

So, finally we were able to perform the copy operation successfully.

We did something like this and we named it "ohsm's tricky copy".
Rohit will soon be uploading a new doc soon on the fscops page which
will detail it further.

1. Read the source inode.
2. Allocate a new ghost inode.
3. Take a lock on the source inode. /* mutex , because the nr_blocks
can change if write comes now from user space */
4. Read the number of blocks.
>
5. Allocate the same number of blocks for the dummy ghost inode. /*
the chain will be created automatically */
6. Read the source buffer head of the blocks from source inode and
destination buffer head of the blocks of the destination inode.

7. dest_buffer->b_data = source_buffer->b_data ; /* its a char * and
this is where the trick is */
8. mark the destination buffer dirty.

perform 6,7,8 for all the blocks.

9. swap the src_inode->i_data[15] and dest_dummy_inode->i_data[15]; /*
This helps us to simply avoid copying the block number back from
destination dummy inode to source inode */
/* This also helps to simply destroy the inode, which will eventually
free all the blocks, which otherwise we would have been doing
separately */

9.1 Release the mutex on the src inode.

10. set the bit for I_FREEING in dest_inode->i_state.

'11. call FS_delete_inode(dest_inode);

 Any application which is already opened this inode for read/write,
tries to do read/write when the mutex lock is taken, it will be
queued.

>

Thanks a lot Greg,Manish, Peter and all others for all your valuable
inputs and help.

> --
> Regards,
> Peter Teoh
>

-- 
Regards,
Sandeep.

"To learn is to change. Education is a process that changes the learner."

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ