Re: Copying Data Blocks

"Sandeep K Sinha" <sandeepksinha@xxxxxxxxx> · Thu, 8 Jan 2009 11:38:55 +0530

Hey Greg,

On Thu, Jan 8, 2009 at 10:14 AM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
> On Wed, Jan 7, 2009 at 11:05 PM, Sandeep K Sinha
> <sandeepksinha@xxxxxxxxx> wrote:
>> Hi Greg,
>>
>> Just to give you a context of the problem :
>> refer:
>> http://code.google.com/p/fscops/
>>
>> reply inline.
>>
>> On Wed, Jan 7, 2009 at 9:03 PM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
>>> On Wed, Jan 7, 2009 at 5:48 AM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>>>> On Wed, Jan 7, 2009 at 12:44 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
>>>>> On Wed, Jan 7, 2009 at 12:17 PM, Sandeep K Sinha
>>>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>>>> Ok, Let me rephrase what rohit is exactly trying to question.
>>>>>>
>>>>>> There is an inode X which has say some N number of data blocks.
>>>>>> Now, through his own kernel module and some changes to the file system,
>>>>>> he wants to create a new inode Y in the FS and physically copy all the
>>>>>> data from the old inode to the new inode.
>>>>>
>>>>> Errr....... I must be missing something...... For this why do you need
>>>>> to copy the data blocks ? if you just copy the old inode to new inode,
>>>>> you have already copied the direct and indirect block pointers right ?
>>>>> That will not take much time, and now if you free the old inode, you
>>>>> have virtually changed the ownership of old blocks to the new inode.
>>>>>
>>>> The problem is not replacing the inode, i want to physically move the data.
>>>> That means if inode  X and its data blocks are in block group 1, and
>>>> new inode is in block group 100
>>>> then i will allocate data blocks in block group 100 and copy the data
>>>> from inode X to inode Y.
>>>> So i will be able to physically relocate a file, and change the
>>>> directory entry to contain inode Y.
>>>>
>>>>> The problems i can see with this approach is that if the new inode is
>>>>> not in the same block group as old inode, you have *kind of broken*
>>>>> the ext2's intelligence of allocating the blocks in the same block
>>>>> group.
>>>>>
>>>>> CMIIW . btw this thread is interesting :-)
>>>>
>>>> Yes its interesting. :-)
>>>>
>>>>>
>>>> I haven't actually broken ext2's intelligence completely, i have only put
>>>> restrictions in allocation of inode and data blocks.
>>>> And it works fine with existing optimizations.
>>>>
>>>> And the major issue is relocating files between different block group range.
>>>
>>> So if I understand your high level desire, you want to write a
>>> filesystem re-org (or defrag or something) that works one file at a
>>> time.
>>>
>> Yes kind of, you can say that.
>>
>>> You have to do it in the kernel because you want to control the inode
>>> and data block allocation.
>>>
>>
>> Because I want to keep control the allocation of data blocks of a file
>> to a specific device underneath a LVM. And so the mapping of blocks
>> from FS->LVM->DEVICES resides inside the kernel only.
>>
>>> Your current thought is to mount the entire filesystem readonly, do
>>> the re-org, remount r/w.
>>>
>> Well, this should work for now but ya we will look for an alternative
>> for sure. something like freeze/thaw.
>>
>>> If this is just for yourself, it might be acceptable.  If this is for
>>> the community, it is not (IMO).
>>>
>> Surely this is for our personal use.
>>
>>> To be of value to the community, you need to be more aggressive and
>>> get this to work on a running filesystem.
>>>
>>
>> Surely be a milestone, soon.
>>
>>> My first attempt at high-level pseudo code would be:
>>>
>>> ===========
>>> re_org_file()
>>> {
>>>  read_orig_inode()
>>>  set inode.re_org_in_progress = true
>>>
>>> lock(inode.re_org_in_progress)
>>>  allocate destination inode                              // Do not
>>> initiate and real i/o
>>>  allocate all destination indirect pointer blocks  // Do not initiate
>>> and real i/o
>>>  allocate all destination data blocks                 // Do not
>>> initiate and real i/o
>>>
>>>  allocate_file_re_org_done_array   // one bit per data block
>>>  memset (file_re_org_done_array, false)
>>>  release_lock(inode.re_org_in_progress)
>>>
>>>  for each bit in file_re_org_done_array[]  {
>>>     if (not file_re_org_done_array[block]) {
>>>          lock(inode_re_org_in_progress)
>>>          copy_block()   // I know, your question is how to do this
>>>          set file_re_org_done_array[block] = true
>>>          release_lock(inode_re_org_in_progress)
>>>     }
>>>  }
>>>
>>>  lock(inode.re_org_in_progress)
>>>  copy_inode_info()
>>
>> Well, currently I dont intend to move the inode to a new location. I
>> would prefer leave the original inode intact just updating the new
>> data block pointers. This is still in debate, whether to relocate
>> inode or not.
>>
>>>  update_directory_entries()
>>>  release_lock(inode._re_org_in_progress)
>>>
>>>  set inode.re_org_in_progress = false
>>> }
>>>
>>> Then insert logic into the write() code that does:
>>>
>>> // inserted write logic
>>> if inode.re_org_in_progress == true {
>>>    lock(inode.re_org_in_progress)
>>>    send data to orig block                   // no real i/o needed
>>>    send data to new dest block           // no real i/o needed
>>>    file_re_org_done_array[block] = true
>>>    release_lock(inode.re_org_in_progress)
>>> } else
>>>    send data to normal block                   // no real i/o needed
>>>
>>> // end of inserted write logic
>>
>> What exactly is this required for ? Is this for any kind of metadata updates ?
>
> I assumed you needed to effectively "copy the file to a new
> destination, then delete the original file".
>
> To do that on a live file with minimal interference with user space
> invoked i/o I created a ghost version which initially had empty data
> blocks.
>
> Then I allowed normal file i/o to continue.  Look at the first chunk
> of code and see where I released the lock.  Whenever the lock is
> released normal user space file i/o is allowed to occur.
>
> Reads are easily handled by reading from the original file.
>
> Writes on the other hand have to update both the original file data
> blocks and the newly allocated data blocks.
>
> And as I look again at the pseudo code, I forgot to do the file delete
> of the original inode at the end.
>

The overall idea really makes good sense. But will be a bit
complicated to handle this, not sure.
This is something similar to handle a failure in a mirrored environment.

> Greg
> --
> Greg Freemyer
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> First 99 Days Litigation White Paper -
> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>

-- 
Regards,
Sandeep.

"To learn is to change. Education is a process that changes the learner."

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ