Re: Copying Data Blocks

"Greg Freemyer" <greg.freemyer@xxxxxxxxx> · Thu, 8 Jan 2009 19:03:42 -0500

On Thu, Jan 8, 2009 at 12:55 PM, Sandeep K Sinha
<sandeepksinha@xxxxxxxxx> wrote:
> Hi Greg,
>
>> So if I understand your high level desire, you want to write a
>> filesystem re-org (or defrag or something) that works one file at a
>> time.
>>
>> You have to do it in the kernel because you want to control the inode
>> and data block allocation.
>>
>> Your current thought is to mount the entire filesystem readonly, do
>> the re-org, remount r/w.
>>
>> If this is just for yourself, it might be acceptable.  If this is for
>> the community, it is not (IMO).
>>
>> To be of value to the community, you need to be more aggressive and
>> get this to work on a running filesystem.
>>
>> My first attempt at high-level pseudo code would be:
>>
>> ===========
>> re_org_file()
>> {
>>  read_orig_inode()
>>  set inode.re_org_in_progress = true
>>
>> lock(inode.re_org_in_progress)
>>  allocate destination inode                              // Do not
>> initiate and real i/o
>>  allocate all destination indirect pointer blocks  // Do not initiate
>> and real i/o
>>  allocate all destination data blocks                 // Do not
>> initiate and real i/o
>>
>
> Sorry, but i didn't understand what you mean by "Do not initiate and
> real i/o" ??

If you are handling this as a readonly filesystem and thus don't need
locking etc., you can ignore that comment.

If you are doing the locking, then this is a reminder that it is okay
to build your structures, add them to the block layer queues, etc. but
you don't want to force actual disk i/o while you have the lock held.
Not that it would be the end of the world in this case, but in general
you want to avoid doing any disk i/o with locks held.  Disk i/o can
obviously take a long time and it could interfere with other users
getting control of the lock.

I believe you should be totally fine as long as you don't use a
barrier or similar to force the code to wait until your block updates
are on disk.

>>  allocate_file_re_org_done_array   // one bit per data block
>>  memset (file_re_org_done_array, false)
>>  release_lock(inode.re_org_in_progress)
>>
>>  for each bit in file_re_org_done_array[]  {
>>     if (not file_re_org_done_array[block]) {
>>          lock(inode_re_org_in_progress)
>>          copy_block()   // I know, your question is how to do this
>>          set file_re_org_done_array[block] = true
>>          release_lock(inode_re_org_in_progress)
>>     }
>>  }
>>
>>  lock(inode.re_org_in_progress)
>>  copy_inode_info()
>>  update_directory_entries()
>>  release_lock(inode._re_org_in_progress)
>>
>>  set inode.re_org_in_progress = false
>> }
>>
>> Then insert logic into the write() code that does:
>>
>> // inserted write logic
>> if inode.re_org_in_progress == true {
>>    lock(inode.re_org_in_progress)
>>    send data to orig block                   // no real i/o needed
>>    send data to new dest block           // no real i/o needed
> What do you mean by no real i/o needed ???

Same as above.  Just don't force the code to pause until the data is
actually on disk.

>>    file_re_org_done_array[block] = true
>>    release_lock(inode.re_org_in_progress)
>> } else
>>    send data to normal block                   // no real i/o needed
>>
>> // end of inserted write logic
>> ====================================
>>
>
> Do we need to copy the blocks, which has been written after the
> initiation of this block relocation ?
> I mean the copy operation will be redundant.
> What do you say ?
>

If you allow user land writes to the file, then you have 3 highlevel
time frames to consider:

1) Pre re-org
2) re-org in progress
3) post re-org

1) During pre re-org you only have to update the original file obviously.

2) During post re-org if all went well, you only have to update the
new re-organized file.

SHIT, I totally forgot about any user space applications that have the
old file open.  You will need to ensure they read / write from the new
re-org file after all is said and done.  Any lurkers out there have
any ideas how to do that?

For your first effort, if you decide to do the read-only mount, you
don't have to address this issue.  But if you are working with a live
file system, you will.

3) During the re-org itself, you have to keep the original file 100%
valid so writes have to go to it.  And you also have to update the new
re-organized file or will not have the update.  I am setting the done
flag for any blocks updated by user space initiated writes just to
reduce duplicate effort.  ie. If the main duplicate loop is at block
100 and user space write causes block 150 to updated in both the
original and in the new re-organized file, then there is no reason for
the main duplicate loop to have to copy that data block again.

Greg
-- 
Greg Freemyer
Litigation Triage Solutions Specialist
http://www.linkedin.com/in/gregfreemyer
First 99 Days Litigation White Paper -
http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf

The Norcross Group
The Intersection of Evidence & Technology
http://www.norcrossgroup.com

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ