Re: Copying Data Blocks

"Rohit Sharma" <imreckless@xxxxxxxxx> · Tue, 13 Jan 2009 12:27:36 +0530

On Mon, Jan 12, 2009 at 11:48 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
> On Mon, Jan 12, 2009 at 11:31 PM, Sandeep K Sinha
> <sandeepksinha@xxxxxxxxx> wrote:
>> Hi Peter,
>>
>> On Mon, Jan 12, 2009 at 9:49 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>> On Mon, Jan 12, 2009 at 4:26 PM, Sandeep K Sinha
>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>> Hi Peter,
>>>>
>>>> Don't you think that if will restrict this to a specific file system.
>>>> VFS inode should be used rather than the FS incore inode ?
>>>>
>>>
>>> vfs have an API:   fsync_buffer_list(), and
>>> invalidate_inode_buffers(), and these API seemed to used spinlock for
>>> syncing:
>>>
>>> void invalidate_inode_buffers(struct inode *inode)
>>> {
>>>        if (inode_has_buffers(inode)) {
>>>                struct address_space *mapping = &inode->i_data;
>>>                struct list_head *list = &mapping->private_list;
>>>                struct address_space *buffer_mapping = mapping->assoc_mapping;
>>>
>>>                spin_lock(&buffer_mapping->private_lock);
>>>                while (!list_empty(list))
>>>
>>> __remove_assoc_queue(BH_ENTRY(list->next));======> modify this for
>>> writing out the data instead.
>>>                spin_unlock(&buffer_mapping->private_lock);
>>>        }
>>> }
>>> EXPORT_SYMBOL(invalidate_inode_buffers);
>>>
>>>
>>>> The purpose if to sleep all the i/o's when we are updating the i_data
>>>> from the new inode to the old inode ( updation of the data blocks ).
>>>>
>>>> I think i_alloc_sem should work here, but could not find any instance
>>>> of its use in the code.
>>>
>>> for the case of ext3's blcok allocation, the lock seemed to be
>>> truncate_mutex - read the remark:
>>>
>>>        /*
>>>         * From here we block out all ext3_get_block() callers who want to
>>>         * modify the block allocation tree.
>>>         */
>>>        mutex_lock(&ei->truncate_mutex);
>>>
>>> So while it is building the tree, the mutex will lock it.
>>>
>>> And the remarks for ext3_get_blocks_handle() are:
>>>
>>> /*
>>>  * Allocation strategy is simple: if we have to allocate something, we will
>>>  * have to go the whole way to leaf. So let's do it before attaching anything
>>>  * to tree, set linkage between the newborn blocks, write them if sync is
>>>  * required, recheck the path, free and repeat if check fails, otherwise
>>>  * set the last missing link (that will protect us from any truncate-generated
>>> ...
>>>
>>> reading the source....go down and see the mutex_lock() (where
>>> multiblock allocation are needed) and after the lock, all the blocks
>>> allocation/merging etc are done:
>>>
>>>        /* Next simple case - plain lookup or failed read of indirect block */
>>>        if (!create || err == -EIO)
>>>                goto cleanup;
>>>
>>>        mutex_lock(&ei->truncate_mutex);
>>> <snip>
>>>        count = ext3_blks_to_allocate(partial, indirect_blks,
>>>                                        maxblocks, blocks_to_boundary);
>>> <snip>
>>>        err = ext3_alloc_branch(handle, inode, indirect_blks, &count, goal,
>>>                                offsets + (partial - chain), partial);
>>>
>>>
>>>> It's working fine currently with i_mutex, meaning if we hold a i_mutex
>>>
>>> as far as i know, i_mutex are used for modifying inode's structures information:
>>>
>>> grep for i_mutex in fs/ext3/ioctl.c and everytime there is a need to
>>> maintain inode's structural info, the lock on i_mutex is called.
>>>
>>>> lock on the inode while updating the i_data pointers.
>>>> And try to perform i/o from user space, they are queued. The file was
>>>> opened in r/w mode prior to taking the lock inside the kernel.
>>>>
>>>> But, I still feel i_alloc_sem would be the right option to go ahead with.
>>>>
>>>> On Mon, Jan 12, 2009 at 1:11 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>> If u grep for spinlock, mutex, or "sem" in the fs/ext4 directory, u
>>>>> can find all three types of lock are used - for different class of
>>>>> object.
>>>>>
>>>>> For data blocks I guessed is semaphore - read this
>>>>> fs/ext4/inode.c:ext4_get_branch():
>>>>>
>>>>> /**
>>>>>  *      ext4_get_branch - read the chain of indirect blocks leading to data
>>>>> <snip>
>>>>>  *
>>>>>  *      Need to be called with
>>>>>  *      down_read(&EXT4_I(inode)->i_data_sem)
>>>>>  */
>>>>>
>>>>> i guess u have no choice, as it is semaphore, have to follow the rest
>>>>> of kernel for consistency - don't create your own semaphore :-).
>>>>>
>>>>> There exists i_lock as spinlock - which so far i know is for i_blocks
>>>>> counting purposes:
>>>>>
>>>>>       spin_lock(&inode->i_lock);
>>>>>        inode->i_blocks += tmp_inode->i_blocks;
>>>>>        spin_unlock(&inode->i_lock);
>>>>>        up_write(&EXT4_I(inode)->i_data_sem);
>>>>>
>>>>> But for data it should be i_data_sem.   Is that correct?
>>>>>
>>>>> On Mon, Jan 12, 2009 at 2:18 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>>> Hi,
>>>>>>
>>>>>> I am having some issues in locking inode while copying data blocks.
>>>>>> We are trying to keep file system live during this operation, so
>>>>>> both read and write operations should work.
>>>>>> In this case what type of lock on inode should be used, semaphore,
>>>>>> mutex or spinlock?
>>>>>>
>>>>>>
>>>>>> On Sun, Jan 11, 2009 at 8:45 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>>>> Sorry.....some mistakes...a resent:
>>>>>>>
>>>>>>> Here are some tips on the blockdevice API:
>>>>>>>
>>>>>>> http://lkml.org/lkml/2006/1/24/287
>>>>>>> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-01/msg09388.html
>>>>>>>
>>>>>>> as indicated, documentation is rather sparse in this area.
>>>>>>>
>>>>>>> not sure if anyone else have a summary list of blockdevice API and its
>>>>>>> explanation?
>>>>>>>
>>>>>>> not wrt the following "cleanup patch", i am not sure how the API will change:
>>>>>>>
>>>>>>> http://lwn.net/Articles/304485/
>>>>>>>
>>>>>>> thanks.
>>>>>>>
>>>>>>> On Tue, Jan 6, 2009 at 6:36 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>>>>>
>>>>>>>> I want to read data blocks from one inode
>>>>>>>> and copy it to other inode.
>>>>>>>>
>>>>>>>> I mean to copy data from data blocks associated with one inode
>>>>>>>> to the data blocks associated with other inode.
>>>>>>>>
>>>>>>>> Is that possible in kernel space.?
>>>>>>>> --
>>>>>
>>>
>>> comments ????
>>
>> Thats very right !!!
>>
>> So, finally we were able to perform the copy operation successfully.
>>
>> We did something like this and we named it "ohsm's tricky copy".
>> Rohit will soon be uploading a new doc soon on the fscops page which
>> will detail it further.
>

Hi Manish,

> Thanks let us know when the docs and the *source code* is available ;-)

We have been working on other modules of OHSM, most of the modules
have been coded for ohsm ( the framework is ready ). Once we finish with
this, i'll mail you the doc.

>
>>
>> 1. Read the source inode.
>> 2. Allocate a new ghost inode.
>> 3. Take a lock on the source inode. /* mutex , because the nr_blocks
>> can change if write comes now from user space */
>> 4. Read the number of blocks.
>>>
>> 5. Allocate the same number of blocks for the dummy ghost inode. /*
>> the chain will be created automatically */
>> 6. Read the source buffer head of the blocks from source inode and
>> destination buffer head of the blocks of the destination inode.
>>
>> 7. dest_buffer->b_data = source_buffer->b_data ; /* its a char * and
>> this is where the trick is */
>> 8. mark the destination buffer dirty.
>>
>> perform 6,7,8 for all the blocks.
>>
>> 9. swap the src_inode->i_data[15] and dest_dummy_inode->i_data[15]; /*
>> This helps us to simply avoid copying the block number back from
>> destination dummy inode to source inode */
>
> I don't know anything about LVM, so this might be a dumb question. Why
> is this required ?

LVM is same as mvfs, it allows a FS to span over multiple volumes.

> Did you mean swapping all the block numbers rather
> than just the [15] ??

Yes we are swapping block numbers between src_inode and dest_inode
and finally src_inode is updated with new data blocks.

>Here src_inode is the vfs "struct inode" or the
> FS specific struct FS_inode_info ???  i didn't get this completely,
> can you explain this point a bit more.

Here src_inode is VFS inode, with this we can get the FS_inode_info
structure.
for eg. we can get ext2_inode_info from vfs inode like this.
ext2_inode = EXT2_I(vfs_inode)

>
> Thanks -
> Manish
>
>
>> /* This also helps to simply destroy the inode, which will eventually
>> free all the blocks, which otherwise we would have been doing
>> separately */
>>
>> 9.1 Release the mutex on the src inode.
>>
>> 10. set the bit for I_FREEING in dest_inode->i_state.
>>
>> '11. call FS_delete_inode(dest_inode);
>>
>>  Any application which is already opened this inode for read/write,
>> tries to do read/write when the mutex lock is taken, it will be
>> queued.
>>
>>>
>>
>> Thanks a lot Greg,Manish, Peter and all others for all your valuable
>> inputs and help.
>>
>>> --
>>> Regards,
>>> Peter Teoh
>>>
>>
>> --
>> Regards,
>> Sandeep.
>>
>>
>>
>>
>>
>> "To learn is to change. Education is a process that changes the learner."
>>
>> --
>> To unsubscribe from this list: send an email with
>> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
>> Please read the FAQ at http://kernelnewbies.org/FAQ
>>
>>
>

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ