Re: Copying Data Blocks

Peter Teoh <htmldeveloper@xxxxxxxxx> · Wed, 7 Jan 2009 23:47:17 +0800



On Wed, Jan 7, 2009 at 11:33 PM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
> On Wed, Jan 7, 2009 at 5:48 AM, Rohit Sharma <imreckless@xxxxxxxxx> wrote:
>> On Wed, Jan 7, 2009 at 12:44 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote:
>>> On Wed, Jan 7, 2009 at 12:17 PM, Sandeep K Sinha
>>> <sandeepksinha@xxxxxxxxx> wrote:
>>>> Ok, Let me rephrase what rohit is exactly trying to question.
>>>>
>>>> There is an inode X which has say some N number of data blocks.
>>>> Now, through his own kernel module and some changes to the file system,
>>>> he wants to create a new inode Y in the FS and physically copy all the
>>>> data from the old inode to the new inode.
>>>
>>> Errr....... I must be missing something...... For this why do you need
>>> to copy the data blocks ? if you just copy the old inode to new inode,
>>> you have already copied the direct and indirect block pointers right ?
>>> That will not take much time, and now if you free the old inode, you
>>> have virtually changed the ownership of old blocks to the new inode.
>>>
>> The problem is not replacing the inode, i want to physically move the data.
>> That means if inode  X and its data blocks are in block group 1, and
>> new inode is in block group 100
>> then i will allocate data blocks in block group 100 and copy the data
>> from inode X to inode Y.
>> So i will be able to physically relocate a file, and change the
>> directory entry to contain inode Y.
>>
>>> The problems i can see with this approach is that if the new inode is
>>> not in the same block group as old inode, you have *kind of broken*
>>> the ext2's intelligence of allocating the blocks in the same block
>>> group.
>>>
>>> CMIIW . btw this thread is interesting :-)
>>
>> Yes its interesting. :-)
>>
>>>
>> I haven't actually broken ext2's intelligence completely, i have only put
>> restrictions in allocation of inode and data blocks.
>> And it works fine with existing optimizations.
>>
>> And the major issue is relocating files between different block group range.
>
> So if I understand your high level desire, you want to write a
> filesystem re-org (or defrag or something) that works one file at a
> time.
>
> You have to do it in the kernel because you want to control the inode
> and data block allocation.
>
> Your current thought is to mount the entire filesystem readonly, do
> the re-org, remount r/w.
>
> If this is just for yourself, it might be acceptable.  If this is for
> the community, it is not (IMO).
>
> To be of value to the community, you need to be more aggressive and
> get this to work on a running filesystem.
>
> My first attempt at high-level pseudo code would be:
>
> ===========
> re_org_file()
> {
>  read_orig_inode()
>  set inode.re_org_in_progress = true
>
> lock(inode.re_org_in_progress)
>  allocate destination inode                              // Do not
> initiate and real i/o
>  allocate all destination indirect pointer blocks  // Do not initiate
> and real i/o
>  allocate all destination data blocks                 // Do not
> initiate and real i/o
>
>  allocate_file_re_org_done_array   // one bit per data block
>  memset (file_re_org_done_array, false)
>  release_lock(inode.re_org_in_progress)
>
>  for each bit in file_re_org_done_array[]  {
>     if (not file_re_org_done_array[block]) {
>          lock(inode_re_org_in_progress)
>          copy_block()   // I know, your question is how to do this
>          set file_re_org_done_array[block] = true
>          release_lock(inode_re_org_in_progress)
>     }
>  }
>
>  lock(inode.re_org_in_progress)
>  copy_inode_info()
>  update_directory_entries()
>  release_lock(inode._re_org_in_progress)
>
>  set inode.re_org_in_progress = false
> }
>
> Then insert logic into the write() code that does:
>
> // inserted write logic
> if inode.re_org_in_progress == true {
>    lock(inode.re_org_in_progress)
>    send data to orig block                   // no real i/o needed
>    send data to new dest block           // no real i/o needed
>    file_re_org_done_array[block] = true
>    release_lock(inode.re_org_in_progress)
> } else
>    send data to normal block                   // no real i/o needed
>
> // end of inserted write logic
> ====================================
>
> Does that capture the essence of what you are trying to do?
>
> And I assume your first question is still, how do I write copy_block()?
>

Possibly, u can use this function, which uses the block I/O API, and
so is filesystem independent (buffer_head here in this case will be a
linked list of buffer read into memory - from each data block):

fs/buffer.c:

/*
 * For a data-integrity writeout, we need to wait upon any in-progress I/O
 * and then start new I/O and then wait upon it.  The caller must have a ref on
 * the buffer_head.
 */
int sync_dirty_buffer(struct buffer_head *bh)
{
        int ret = 0;

        WARN_ON(atomic_read(&bh->b_count) < 1);
        lock_buffer(bh);
        if (test_clear_buffer_dirty(bh)) {
                get_bh(bh);
                bh->b_end_io = end_buffer_write_sync;
                ret = submit_bh(WRITE_SYNC, bh);
                wait_on_buffer(bh);
                if (buffer_eopnotsupp(bh)) {
                        clear_buffer_eopnotsupp(bh);
                        ret = -EOPNOTSUPP;
                }
                if (!ret && !buffer_uptodate(bh))
                        ret = -EIO;
        } else {
                unlock_buffer(bh);
        }
        return ret;
}

> I don't know the actual semantics for that, but maybe someone can take
> the above and either figure out a better way to accomplish the re-org
> or tell you how to implement copy_block() as needed in the above.
>
> Greg
> --
> Greg Freemyer
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> First 99 Days Litigation White Paper -
> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>
> --
> To unsubscribe from this list: send an email with
> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
> Please read the FAQ at http://kernelnewbies.org/FAQ
>
>


-- 
Regards,
Peter Teoh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ