Re: Copying Data Blocks

"Manish Katiyar" <mkatiyar@xxxxxxxxx> · Mon, 12 Jan 2009 23:48:14 +0530

On Mon, Jan 12, 2009 at 11:31 PM, Sandeep K Sinha
<sandeepksinha@xxxxxxxxx> wrote:
> Hi Peter,
>
> On Mon, Jan 12, 2009 at 9:49 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>> On Mon, Jan 12, 2009 at 4:26 PM, Sandeep K Sinha
>> <sandeepksinha@xxxxxxxxx> wrote:
>>> Hi Peter,
>>>
>>> Don't you think that if will restrict this to a specific file system.
>>> VFS inode should be used rather than the FS incore inode ?
>>>
>>
>> vfs have an API:   fsync_buffer_list(), and
>> invalidate_inode_buffers(), and these API seemed to used spinlock for
>> syncing:
>>
>> void invalidate_inode_buffers(struct inode *inode)
>> {
>>        if (inode_has_buffers(inode)) {
>>                struct address_space *mapping = &inode->i_data;
>>                struct list_head *list = &mapping->private_list;
>>                struct address_space *buffer_mapping = mapping->assoc_mapping;
>>
>>                spin_lock(&buffer_mapping->private_lock);
>>                while (!list_empty(list))
>>
>> __remove_assoc_queue(BH_ENTRY(list->next));======> modify this for
>> writing out the data instead.
>>                spin_unlock(&buffer_mapping->private_lock);
>>        }
>> }
>> EXPORT_SYMBOL(invalidate_inode_buffers);
>>
>>
>>> The purpose if to sleep all the i/o's when we are updating the i_data
>>> from the new inode to the old inode ( updation of the data blocks ).
>>>
>>> I think i_alloc_sem should work here, but could not find any instance
>>> of its use in the code.
>>
>> for the case of ext3's blcok allocation, the lock seemed to be
>> truncate_mutex - read the remark:
>>
>>        /*
>>         * From here we block out all ext3_get_block() callers who want to
>>         * modify the block allocation tree.
>>         */
>>        mutex_lock(&ei->truncate_mutex);
>>
>> So while it is building the tree, the mutex will lock it.
>>
>> And the remarks for ext3_get_blocks_handle() are:
>>
>> /*
>>  * Allocation strategy is simple: if we have to allocate something, we will
>>  * have to go the whole way to leaf. So let's do it before attaching anything
>>  * to tree, set linkage between the newborn blocks, write them if sync is
>>  * required, recheck the path, free and repeat if check fails, otherwise
>>  * set the last missing link (that will protect us from any truncate-generated
>> ...
>>
>> reading the source....go down and see the mutex_lock() (where
>> multiblock allocation are needed) and after the lock, all the blocks
>> allocation/merging etc are done:
>>
>>        /* Next simple case - plain lookup or failed read of indirect block */
>>        if (!create || err == -EIO)
>>                goto cleanup;
>>
>>        mutex_lock(&ei->truncate_mutex);
>> <snip>
>>        count = ext3_blks_to_allocate(partial, indirect_blks,
>>                                        maxblocks, blocks_to_boundary);
>> <snip>
>>        err = ext3_alloc_branch(handle, inode, indirect_blks, &count, goal,
>>                                offsets + (partial - chain), partial);
>>
>>
>>> It's working fine currently with i_mutex, meaning if we hold a i_mutex
>>
>> as far as i know, i_mutex are used for modifying inode's structures information:
>>
>> grep for i_mutex in fs/ext3/ioctl.c and everytime there is a need to
>> maintain inode's structural info, the lock on i_mutex is called.
>>
>>> lock on the inode while updating the i_data pointers.
>>> And try to perform i/o from user space, they are queued. The file was
>>> opened in r/w mode prior to taking the lock inside the kernel.
>>>
>>> But, I still feel i_alloc_sem would be the right option to go ahead with.
>>>
>>> On Mon, Jan 12, 2009 at 1:11 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>> If u grep for spinlock, mutex, or "sem" in the fs/ext4 directory, u
>>>> can find all three types of lock are used - for different class of
>>>> object.
>>>>
>>>> For data blocks I guessed is semaphore - read this
>>>> fs/ext4/inode.c:ext4_get_branch():
>>>>
>>>> /**
>>>>  *      ext4_get_branch - read the chain of indirect blocks leading to data
>>>> <snip>
>>>>  *
>>>>  *      Need to be called with
>>>>  *      down_read(&EXT4_I(inode)->i_data_sem)
>>>>  */
>>>>
>>>> i guess u have no choice, as it is semaphore, have to follow the rest
>>>> of kernel for consistency - don't create your own semaphore :-).
>>>>
>>>> There exists i_lock as spinlock - which so far i know is for i_blocks
>>>> counting purposes:
>>>>
>>>>       spin_lock(&inode->i_lock);
>>>>        inode->i_blocks += tmp_inode->i_blocks;
>>>>        spin_unlock(&inode->i_lock);
>>>>        up_write(&EXT4_I(inode)->i_data_sem);
>>>>
>>>> But for data it should be i_data_sem.   Is that correct?
>>>>
>>>> On Mon, Jan 12, 2009 at 2:18 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>> Hi,
>>>>>
>>>>> I am having some issues in locking inode while copying data blocks.
>>>>> We are trying to keep file system live during this operation, so
>>>>> both read and write operations should work.
>>>>> In this case what type of lock on inode should be used, semaphore,
>>>>> mutex or spinlock?
>>>>>
>>>>>
>>>>> On Sun, Jan 11, 2009 at 8:45 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>>> Sorry.....some mistakes...a resent:
>>>>>>
>>>>>> Here are some tips on the blockdevice API:
>>>>>>
>>>>>> http://lkml.org/lkml/2006/1/24/287
>>>>>> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-01/msg09388.html
>>>>>>
>>>>>> as indicated, documentation is rather sparse in this area.
>>>>>>
>>>>>> not sure if anyone else have a summary list of blockdevice API and its
>>>>>> explanation?
>>>>>>
>>>>>> not wrt the following "cleanup patch", i am not sure how the API will change:
>>>>>>
>>>>>> http://lwn.net/Articles/304485/
>>>>>>
>>>>>> thanks.
>>>>>>
>>>>>> On Tue, Jan 6, 2009 at 6:36 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>>>>
>>>>>>> I want to read data blocks from one inode
>>>>>>> and copy it to other inode.
>>>>>>>
>>>>>>> I mean to copy data from data blocks associated with one inode
>>>>>>> to the data blocks associated with other inode.
>>>>>>>
>>>>>>> Is that possible in kernel space.?
>>>>>>> --
>>>>
>>
>> comments ????
>
> Thats very right !!!
>
> So, finally we were able to perform the copy operation successfully.
>
> We did something like this and we named it "ohsm's tricky copy".
> Rohit will soon be uploading a new doc soon on the fscops page which
> will detail it further.

Thanks let us know when the docs and the *source code* is available ;-)

>
> 1. Read the source inode.
> 2. Allocate a new ghost inode.
> 3. Take a lock on the source inode. /* mutex , because the nr_blocks
> can change if write comes now from user space */
> 4. Read the number of blocks.
>>
> 5. Allocate the same number of blocks for the dummy ghost inode. /*
> the chain will be created automatically */
> 6. Read the source buffer head of the blocks from source inode and
> destination buffer head of the blocks of the destination inode.
>
> 7. dest_buffer->b_data = source_buffer->b_data ; /* its a char * and
> this is where the trick is */
> 8. mark the destination buffer dirty.
>
> perform 6,7,8 for all the blocks.
>
> 9. swap the src_inode->i_data[15] and dest_dummy_inode->i_data[15]; /*
> This helps us to simply avoid copying the block number back from
> destination dummy inode to source inode */

I don't know anything about LVM, so this might be a dumb question. Why
is this required ? Did you mean swapping all the block numbers rather
than just the [15] ?? Here src_inode is the vfs "struct inode" or the
FS specific struct FS_inode_info ???  i didn't get this completely,
can you explain this point a bit more.

Thanks -
Manish

> /* This also helps to simply destroy the inode, which will eventually
> free all the blocks, which otherwise we would have been doing
> separately */
>
> 9.1 Release the mutex on the src inode.
>
> 10. set the bit for I_FREEING in dest_inode->i_state.
>
> '11. call FS_delete_inode(dest_inode);
>
>  Any application which is already opened this inode for read/write,
> tries to do read/write when the mutex lock is taken, it will be
> queued.
>
>>
>
> Thanks a lot Greg,Manish, Peter and all others for all your valuable
> inputs and help.
>
>> --
>> Regards,
>> Peter Teoh
>>
>
> --
> Regards,
> Sandeep.
>
>
>
>
>
> "To learn is to change. Education is a process that changes the learner."
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ