Re: Copying Data Blocks

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



Hi manish,

On Wed, Jan 14, 2009 at 6:44 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
> On Wed, Jan 14, 2009 at 12:39 PM, Sandeep K Sinha
> <sandeepksinha@xxxxxxxxx> wrote:
>> On Wed, Jan 14, 2009 at 10:44 AM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
>>> On Wed, Jan 14, 2009 at 10:32 AM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>> thinking deeper, i have a concern for you.
>>>>
>>>> based on the VFS layering concept, u have no problem if the internal
>>>> of ext3 is not touch.   but now u seemed to be doing a lot of stuff at
>>>> the ext2/ext3 layer.
>>>
>>> IIRC they are dealing only with ext2 currently
>>>
>> Thats very right !
>> The code level work is being done for ext2. But, ya nothing much to be
>> done for ext3.
>> We have looked onto its design.
>>
>>>>
>>>> i was reading about the reservation concept in ext2/ext3 and found it
>>>> quite complex.   blocks can be preallocated and put on the reservation
>>>> list, but at the same time, it can and should be possible to be taken
>>>> away by another file, if the file needed that block for use.   ie,
>>>> reservation does not guarantee u ownership of that block, but based on
>>>> "polite courtesy" (as i read somewhere), other parts of ext2/ext3 is
>>>> supposed to avoid using that blocks.   but if storage space are really
>>>> low....well...and since that block is not being used...it should be
>>>> reassigned to someone.
>>>
>>> Correct.... but that also raises few more questions. Sandeep , do you
>>> have any pre-requisites about the sizing of disks for OHSM to work ??
>>> For example lets say I have 3 disks d1, d2 & d3 in descending order of
>>> speed. Do all of them have to be of same size ? If they are then you
>>
>> No, They don't need to be. We would never like to have such
>> restrictions. Obviously, th cheaper disks would be bigger than
>> expensive disks, same based on speed too.
>> Also, the TIER's will never be identical at any instance
>> logically,even if the disk sizes are same.
>> Suppose during first relocation only few files qualify and they are
>> relocated to some other teir. Then your assumption will fail for the
>> same size, right ?
>> Correct me if I got you wrong.
>>
>>> really don't need to worry much about space preallocation, because you
>>> know that if you had space in d1 to allocate an inode in first place,
>>> you can replicate the same layout in d2.
>>>
>>
>> Manish, if you can refer
>> http://fscops.googlecode.com/files/OHSM_Relocation_faqs_0.1.pdf
>
> Hi Sandeep,
>
> I was going through your nice FAQ on OHSM. However it has a line "Then

Thanks.

> we use the OHSM's very own 'Tricky Copy' algorithm." I am not sure
> about this, because it seems ext3 already does this for its
> journalling. Refer to the below paragraph from the link
> http://olstrans.sourceforge.net/release/OLS2000-ext3/OLS2000-ext3.html
>

Thanks for pointing this. Actually, "Tricky Copy Algorith" is tricky
because of the way it is used for copying the blocks in comparison to
the other similar tools do it.

I agree, ext3 has a similar mechanism, but that different. I remember
implementing a similar stuff like ext3, when I was inplemnting my
Block Level CDP.

> "So, for example, when we're doing journaling, the filesystem may say:
> take some buffer that's in memory and I'm going to write this to disk
> for that application there and journal it. And the journaling code has
> to make sure that block gets written to the journal first, and then,
> after the commit, it goes to its main location on disk. And the
> journaling layer will do zero-copy from that; it will actually create
> a new I/O request that points to the old disk buffer location and use
> that to journal the data to the journal file, without copying the data
> block. Now all that kind of thing is handled by the JFS layer"
>
> And IIRC journal devices can be another disks too, so Mr. Stephen
> Tweedie might get offended :-)
>
Generally, they are other disks generally in huge database environments.
No offence please. I have just named the whole transcation as our very
own, thats it.

> Thanks -
> Manish
>



>
>
>
>>
>> It mentions that we check for the amount of space required on the
>> destination tier to proceed with relocation.
>> If its not there we ask the admin to free some space and issue an
>> IOCTL ( which will be a command for him) that he is done. So, that we
>> can start relocating. He will have the facility to either re-trigger
>> or just say that we has freed space. The reason is that re-triggering
>> will again cause the whole FS to be scanned again.
>>
>>> Problem comes when d2 is less than d1. Is it possible that you migrate
>>> only some of the blocks to d2 and leave some in d1 if d2 runs out of
>>> space ?
>>>
>>
>> No, that won't make sense at all. See, the home_tier_id of a file
>> signifies its property based on allocation policy initially. Say, the
>> admin applies one allocation policy, now the admin will think that I
>> have allocated mp3 files on TIER1 and after trigger relocation ( the
>> policy was to move all mp3 to tier 3), now he will have a feeling that
>> all mp3's are fetched from tier 3 where as it would be a mixture.
>> So, in design we decided, two things,
>> first, recalled, "fail but shout clearly". And secondly, we warn a
>> user if the tier is 80% full and also at the time of reloc if the
>> destination tier has less space then required.
>>
>> Does that sound OK to you guys ?
>>
>>> Thanks -
>>> Manish
>>>
>>
>>>>
>>>> ie.....when u allocate blocks and use it....do u actually update any
>>>> reservation list?   or is necessary to do so?   or are u supposed to
>>>> read the reservation list before allocation of blocks?   i am not
>>>> sure.   all these are protocols obeyed internally within ext3.   and
>>>> since block allocation is not part of ext3 but at the blocks level,
>>>> the API will not care about the existence of any reservation lists
>>>> which is part of ext3.
>>>>
>>>> In general, if your software is not going into mainline kernel, my
>>>> personal preference is NOT to do it at the ext3 layer....but higher
>>>> than that.......noticed that fs/ext2 fs/ext3 and fs/ext4 all does not
>>>> have any EXPORT API for other subsystem to call?   well....this is for
>>>> internal consistency as described above.
>>>>
>>>> but ext3 used jbd, so fs/jbd does have exported API.   so anyone can
>>>> call these exported API without messing up the internal consistency of
>>>> jbd.
>>>>
>>>> end of the day, i may be plain wrong :-).
>>>>
>>>> comments?
>>>>
>>>> On Tue, Jan 13, 2009 at 6:41 PM, Sandeep K Sinha
>>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>>> Hey Manish,
>>>>>
>>>>> On Tue, Jan 13, 2009 at 2:00 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
>>>>>> On Tue, Jan 13, 2009 at 1:21 PM, Sandeep K Sinha
>>>>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>>>>> Hi Manish,
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Jan 12, 2009 at 11:48 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
>>>>>>>> On Mon, Jan 12, 2009 at 11:31 PM, Sandeep K Sinha
>>>>>>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>>>>>>> Hi Peter,
>>>>>>>>>
>>>>>>>>> On Mon, Jan 12, 2009 at 9:49 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>>>>>>> On Mon, Jan 12, 2009 at 4:26 PM, Sandeep K Sinha
>>>>>>>>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>>>>>>>>> Hi Peter,
>>>>>>>>>>>
>>>>>>>>>>> Don't you think that if will restrict this to a specific file system.
>>>>>>>>>>> VFS inode should be used rather than the FS incore inode ?
>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> vfs have an API:   fsync_buffer_list(), and
>>>>>>>>>> invalidate_inode_buffers(), and these API seemed to used spinlock for
>>>>>>>>>> syncing:
>>>>>>>>>>
>>>>>>>>>> void invalidate_inode_buffers(struct inode *inode)
>>>>>>>>>> {
>>>>>>>>>>        if (inode_has_buffers(inode)) {
>>>>>>>>>>                struct address_space *mapping = &inode->i_data;
>>>>>>>>>>                struct list_head *list = &mapping->private_list;
>>>>>>>>>>                struct address_space *buffer_mapping = mapping->assoc_mapping;
>>>>>>>>>>
>>>>>>>>>>                spin_lock(&buffer_mapping->private_lock);
>>>>>>>>>>                while (!list_empty(list))
>>>>>>>>>>
>>>>>>>>>> __remove_assoc_queue(BH_ENTRY(list->next));======> modify this for
>>>>>>>>>> writing out the data instead.
>>>>>>>>>>                spin_unlock(&buffer_mapping->private_lock);
>>>>>>>>>>        }
>>>>>>>>>> }
>>>>>>>>>> EXPORT_SYMBOL(invalidate_inode_buffers);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> The purpose if to sleep all the i/o's when we are updating the i_data
>>>>>>>>>>> from the new inode to the old inode ( updation of the data blocks ).
>>>>>>>>>>>
>>>>>>>>>>> I think i_alloc_sem should work here, but could not find any instance
>>>>>>>>>>> of its use in the code.
>>>>>>>>>>
>>>>>>>>>> for the case of ext3's blcok allocation, the lock seemed to be
>>>>>>>>>> truncate_mutex - read the remark:
>>>>>>>>>>
>>>>>>>>>>        /*
>>>>>>>>>>         * From here we block out all ext3_get_block() callers who want to
>>>>>>>>>>         * modify the block allocation tree.
>>>>>>>>>>         */
>>>>>>>>>>        mutex_lock(&ei->truncate_mutex);
>>>>>>>>>>
>>>>>>>>>> So while it is building the tree, the mutex will lock it.
>>>>>>>>>>
>>>>>>>>>> And the remarks for ext3_get_blocks_handle() are:
>>>>>>>>>>
>>>>>>>>>> /*
>>>>>>>>>>  * Allocation strategy is simple: if we have to allocate something, we will
>>>>>>>>>>  * have to go the whole way to leaf. So let's do it before attaching anything
>>>>>>>>>>  * to tree, set linkage between the newborn blocks, write them if sync is
>>>>>>>>>>  * required, recheck the path, free and repeat if check fails, otherwise
>>>>>>>>>>  * set the last missing link (that will protect us from any truncate-generated
>>>>>>>>>> ...
>>>>>>>>>>
>>>>>>>>>> reading the source....go down and see the mutex_lock() (where
>>>>>>>>>> multiblock allocation are needed) and after the lock, all the blocks
>>>>>>>>>> allocation/merging etc are done:
>>>>>>>>>>
>>>>>>>>>>        /* Next simple case - plain lookup or failed read of indirect block */
>>>>>>>>>>        if (!create || err == -EIO)
>>>>>>>>>>                goto cleanup;
>>>>>>>>>>
>>>>>>>>>>        mutex_lock(&ei->truncate_mutex);
>>>>>>>>>> <snip>
>>>>>>>>>>        count = ext3_blks_to_allocate(partial, indirect_blks,
>>>>>>>>>>                                        maxblocks, blocks_to_boundary);
>>>>>>>>>> <snip>
>>>>>>>>>>        err = ext3_alloc_branch(handle, inode, indirect_blks, &count, goal,
>>>>>>>>>>                                offsets + (partial - chain), partial);
>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>>> It's working fine currently with i_mutex, meaning if we hold a i_mutex
>>>>>>>>>>
>>>>>>>>>> as far as i know, i_mutex are used for modifying inode's structures information:
>>>>>>>>>>
>>>>>>>>>> grep for i_mutex in fs/ext3/ioctl.c and everytime there is a need to
>>>>>>>>>> maintain inode's structural info, the lock on i_mutex is called.
>>>>>>>>>>
>>>>>>>>>>> lock on the inode while updating the i_data pointers.
>>>>>>>>>>> And try to perform i/o from user space, they are queued. The file was
>>>>>>>>>>> opened in r/w mode prior to taking the lock inside the kernel.
>>>>>>>>>>>
>>>>>>>>>>> But, I still feel i_alloc_sem would be the right option to go ahead with.
>>>>>>>>>>>
>>>>>>>>>>> On Mon, Jan 12, 2009 at 1:11 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>>>>>>>>> If u grep for spinlock, mutex, or "sem" in the fs/ext4 directory, u
>>>>>>>>>>>> can find all three types of lock are used - for different class of
>>>>>>>>>>>> object.
>>>>>>>>>>>>
>>>>>>>>>>>> For data blocks I guessed is semaphore - read this
>>>>>>>>>>>> fs/ext4/inode.c:ext4_get_branch():
>>>>>>>>>>>>
>>>>>>>>>>>> /**
>>>>>>>>>>>>  *      ext4_get_branch - read the chain of indirect blocks leading to data
>>>>>>>>>>>> <snip>
>>>>>>>>>>>>  *
>>>>>>>>>>>>  *      Need to be called with
>>>>>>>>>>>>  *      down_read(&EXT4_I(inode)->i_data_sem)
>>>>>>>>>>>>  */
>>>>>>>>>>>>
>>>>>>>>>>>> i guess u have no choice, as it is semaphore, have to follow the rest
>>>>>>>>>>>> of kernel for consistency - don't create your own semaphore :-).
>>>>>>>>>>>>
>>>>>>>>>>>> There exists i_lock as spinlock - which so far i know is for i_blocks
>>>>>>>>>>>> counting purposes:
>>>>>>>>>>>>
>>>>>>>>>>>>       spin_lock(&inode->i_lock);
>>>>>>>>>>>>        inode->i_blocks += tmp_inode->i_blocks;
>>>>>>>>>>>>        spin_unlock(&inode->i_lock);
>>>>>>>>>>>>        up_write(&EXT4_I(inode)->i_data_sem);
>>>>>>>>>>>>
>>>>>>>>>>>> But for data it should be i_data_sem.   Is that correct?
>>>>>>>>>>>>
>>>>>>>>>>>> On Mon, Jan 12, 2009 at 2:18 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>>>>>>>>>> Hi,
>>>>>>>>>>>>>
>>>>>>>>>>>>> I am having some issues in locking inode while copying data blocks.
>>>>>>>>>>>>> We are trying to keep file system live during this operation, so
>>>>>>>>>>>>> both read and write operations should work.
>>>>>>>>>>>>> In this case what type of lock on inode should be used, semaphore,
>>>>>>>>>>>>> mutex or spinlock?
>>>>>>>>>>>>>
>>>>>>>>>>>>>
>>>>>>>>>>>>> On Sun, Jan 11, 2009 at 8:45 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>>>>>>>>>>>>>> Sorry.....some mistakes...a resent:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> Here are some tips on the blockdevice API:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lkml.org/lkml/2006/1/24/287
>>>>>>>>>>>>>> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-01/msg09388.html
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> as indicated, documentation is rather sparse in this area.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> not sure if anyone else have a summary list of blockdevice API and its
>>>>>>>>>>>>>> explanation?
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> not wrt the following "cleanup patch", i am not sure how the API will change:
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> http://lwn.net/Articles/304485/
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> thanks.
>>>>>>>>>>>>>>
>>>>>>>>>>>>>> On Tue, Jan 6, 2009 at 6:36 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I want to read data blocks from one inode
>>>>>>>>>>>>>>> and copy it to other inode.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> I mean to copy data from data blocks associated with one inode
>>>>>>>>>>>>>>> to the data blocks associated with other inode.
>>>>>>>>>>>>>>>
>>>>>>>>>>>>>>> Is that possible in kernel space.?
>>>>>>>>>>>>>>> --
>>>>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>> comments ????
>>>>>>>>>
>>>>>>>>> Thats very right !!!
>>>>>>>>>
>>>>>>>>> So, finally we were able to perform the copy operation successfully.
>>>>>>>>>
>>>>>>>>> We did something like this and we named it "ohsm's tricky copy".
>>>>>>>>> Rohit will soon be uploading a new doc soon on the fscops page which
>>>>>>>>> will detail it further.
>>>>>>>>
>>>>>>>> Thanks let us know when the docs and the *source code* is available ;-)
>>>>>>>>
>>>>>>>>>
>>>>>>>>> 1. Read the source inode.
>>>>>>>>> 2. Allocate a new ghost inode.
>>>>>>>>> 3. Take a lock on the source inode. /* mutex , because the nr_blocks
>>>>>>>>> can change if write comes now from user space */
>>>>>>>>> 4. Read the number of blocks.
>>>>>>>>>>
>>>>>>>>> 5. Allocate the same number of blocks for the dummy ghost inode. /*
>>>>>>>>> the chain will be created automatically */
>>>>>>>>> 6. Read the source buffer head of the blocks from source inode and
>>>>>>>>> destination buffer head of the blocks of the destination inode.
>>>>>>>>>
>>>>>>>>> 7. dest_buffer->b_data = source_buffer->b_data ; /* its a char * and
>>>>>>>>> this is where the trick is */
>>>>>>>>> 8. mark the destination buffer dirty.
>>>>>>>>>
>>>>>>>>> perform 6,7,8 for all the blocks.
>>>>>>>>>
>>>>>>>>> 9. swap the src_inode->i_data[15] and dest_dummy_inode->i_data[15]; /*
>>>>>>>>> This helps us to simply avoid copying the block number back from
>>>>>>>>> destination dummy inode to source inode */
>>>>>>>>
>>>>>>>> I don't know anything about LVM, so this might be a dumb question. Why
>>>>>>>> is this required ?
>>>>>>>>
>>>>>>> See, the point here is that we will have a single namespace, meaning
>>>>>>> single file system over all this underlying storage. LVM is a tool
>>>>>>> which provides you API to create logical devices over phycial ones. It
>>>>>>> uses Device Mapper inside the Kernel. This device mapper keeps tha
>>>>>>> mapping between the logical device and all the other underlying
>>>>>>> physical devices.
>>>>>>>
>>>>>>> Now, we require this for defining our storage classes( tiers).  At the
>>>>>>> time of defining the allocation and relocation policy itself , we
>>>>>>> accept information about (dev, tier) list.
>>>>>>> And, we pass this information to our OHSM module inside the kernel and
>>>>>>> extract the mapping from the device mapper and keep it in the OHSM
>>>>>>> metadata. Which is later reffered for all allocations and
>>>>>>> relocation processes.
>>>>>>>
>>>>>>>> Did you mean swapping all the block numbers rather than just the [15] ??
>>>>>>> See the point here is that if we copy each and every new block number
>>>>>>> to the old inode
>>>>>>
>>>>>> Ohh yes.... I got thourougly confused by your word "swap". I thought
>>>>>> it is just like your "tricky swap" :-) and not the "copy and swap"
>>>>>> which you actually meant.
>>>>>>
>>>>>>> and try to free each block from the old inode then we
>>>>>>> will have the overhead of freeing each and every old block and at the
>>>>>>> end freeing the dummy inode that would be created.
>>>>>>>
>>>>>>> So, what we did was that we swapped the [15]pointers of both the inodes.
>>>>>>> See, on linux the arrangement is something like this.
>>>>>>>
>>>>>>> i_data[0] -> direct pointer to a data block
>>>>>>> i_data[1] -> direct pointer to a data block
>>>>>>> i_data[2] -> direct pointer to a data block
>>>>>>> i_data[3] -> direct pointer to a data block
>>>>>>> i_data[4] -> direct pointer to a data block
>>>>>>> i_data[5] -> direct pointer to a data block
>>>>>>> i_data[6] -> direct pointer to a data block
>>>>>>> i_data[7] -> direct pointer to a data block
>>>>>>> i_data[8] -> direct pointer to a data block
>>>>>>> i_data[9] -> direct pointer to a data block
>>>>>>> i_data[10] -> direct pointer to a data block
>>>>>>> i_data[11] -> direct pointer to a data block
>>>>>>> i_data[12] -> direct pointer to a data block
>>>>>>> i_data[13] -> Single Indirect block
>>>>>>> i_data[14] -> double indirect block
>>>>>>>
>>>>>>>
>>>>>>> For all the pointers, we just swap between inodes. Now, it works
>>>>>>> because, for direct blocks its pretty trivial. for Indirect blocks,
>>>>>>> its just like swapping the roots of the chain of blocks. Which
>>>>>>> eventually changes everything.
>>>>>>
>>>>>> Of course........another thing which you might want is to just copy
>>>>>> the block numbers which fall in range of your inode->i_size. This
>>>>>> might help in case of corruptions.
>>>>>>
>>>>>>>
>>>>>>> Now, we simply free the dummy inode by a standard FS function, which
>>>>>>> perform the clean for the inodes and the blocks as well, which we
>>>>>>> wanted to free.
>>>>>>> It reduces our work and obviously the cleanup code existing in FS
>>>>>>> would be more trustworthy :P
>>>>>>>
>>>>>>>> Here src_inode is the
>>>>>>>> vfs "struct inode" or the
>>>>>>>> FS specific struct FS_inode_info ???  i didn't get this completely,
>>>>>>>> can you explain this point a bit more.
>>>>>>>>
>>>>>>>
>>>>>>> See, what we do is that we take a lock at the VFS inode and then we
>>>>>>> perform the job of moving the data blocks from FS incore inode (
>>>>>>> FS_inode_info) .
>>>>>>>
>>>>>>> So, this will be the incore inode.
>>>>>>> Also, the VFS inode doesn't have pointers to the data blocks. The data
>>>>>>> blocks pointers (i_data) is present on incore and on disk inode
>>>>>>> structures.
>>>>>>>
>>>>>>> Hope this answers your query. Let me know if you have more.
>>>>>>
>>>>>> Unless I missed , I didn't get answers to my earlier question about
>>>>>> *special writing* the inode and maintaining consistency.
>>>>>>
>>>>>
>>>>> Here is the question that you asked....
>>>>>
>>>>>>>btw how are you going to *special write* the inode ? If i remember
>>>>>>>correctly you said that you will make the filesystem as readonly. I
>>>>>>>don't know at what all places in write stack we assert for readonly
>>>>>>>flag on FS. One of the places IIRC is do_open() when you try opening
>>>>>>>the file for first time it checks for permission. How do you plan to
>>>>>>>deal with already open file descriptors which are in write mode. If
>>>>>>>you have already investigated all the paths for MS_RDONLY flag, it
>>>>>>>would be great if you can push it somewhere on web. It might be
>>>>>>>helpful for others. And what about the applications who were happily
>>>>>>>doing writes till now , if suddenly their operations start failing ?
>>>>>
>>>>> Well, now we have a complete change in design here. You will
>>>>> understand thing better when we release our design doc. Which we will
>>>>> be doing soon.
>>>>>
>>>>> So, as you must have seen by now that we are not creating a new inode
>>>>> as a replacement of the old one.
>>>>>
>>>>> We just create a dummy inode, allocate blocks into it, copy the data
>>>>> from source blocks and finally swap.
>>>>>
>>>>> Here we take a lock on the inode while making any changes to the
>>>>> inode. Kindly refer the algo that I provided in my previous mails.
>>>>>
>>>>> Case 1: Trying to open a file while relocation is going on ?
>>>>> Case 2: open file descriptor tries to read/write ?
>>>>>
>>>>> In both the cases as we have taken a lock on the inode, both the
>>>>> cases, the user application will queue itself.
>>>>>
>>>>> Now, looking at this time, for which the process will have to wait,
>>>>> As we are not spending time in physically copy data and releasing data
>>>>> blocks and inode, we expect this time to be quite less.
>>>>> Vineet is working on the timing and performance stuff. Vinnet can your
>>>>> provide some kind of time metrics for a say a file that is of 10 Gigs
>>>>> ?
>>>>>
>>>>> PS: We have not made any changes to the write code path at all.
>>>>> The lock synchronizes everything.
>>>>>
>>>>> Manish does that answer your question or I am getting it wrong somewhere ?
>>>>>
>>>>>
>>>>>> Thanks -
>>>>>> Manish
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>> Thanks -
>>>>>>>> Manish
>>>>>>>>
>>>>>>>>
>>>>>>>>> /* This also helps to simply destroy the inode, which will eventually
>>>>>>>>> free all the blocks, which otherwise we would have been doing
>>>>>>>>> separately */
>>>>>>>>>
>>>>>>>>> 9.1 Release the mutex on the src inode.
>>>>>>>>>
>>>>>>>>> 10. set the bit for I_FREEING in dest_inode->i_state.
>>>>>>>>>
>>>>>>>>> '11. call FS_delete_inode(dest_inode);
>>>>>>>>>
>>>>>>>>>  Any application which is already opened this inode for read/write,
>>>>>>>>> tries to do read/write when the mutex lock is taken, it will be
>>>>>>>>> queued.
>>>>>>>>>
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> Thanks a lot Greg,Manish, Peter and all others for all your valuable
>>>>>>>>> inputs and help.
>>>>>>>>>
>>>>>>>>>> --
>>>>>>>>>> Regards,
>>>>>>>>>> Peter Teoh
>>>>>>>>>>
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> Regards,
>>>>>>>>> Sandeep.
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> "To learn is to change. Education is a process that changes the learner."
>>>>>>>>>
>>>>>>>>> --
>>>>>>>>> To unsubscribe from this list: send an email with
>>>>>>>>> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
>>>>>>>>> Please read the FAQ at http://kernelnewbies.org/FAQ
>>>>>>>>>
>>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Regards,
>>>>>>> Sandeep.
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> "To learn is to change. Education is a process that changes the learner."
>>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Regards,
>>>>> Sandeep.
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>>
>>>>> "To learn is to change. Education is a process that changes the learner."
>>>>>
>>>>
>>>>
>>>>
>>>> --
>>>> Regards,
>>>> Peter Teoh
>>>>
>>>
>>
>>
>>
>> --
>> Regards,
>> Sandeep.
>>
>>
>>
>>
>>
>>
>> "To learn is to change. Education is a process that changes the learner."
>>
>



-- 
Regards,
Sandeep.





 	
"To learn is to change. Education is a process that changes the learner."

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ


[Index of Archives]     [Newbies FAQ]     [Linux Kernel Mentors]     [Linux Kernel Development]     [IETF Annouce]     [Git]     [Networking]     [Security]     [Bugtraq]     [Yosemite]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux SCSI]     [Linux ACPI]
  Powered by Linux