Re: [ext2/ext3] Re-allocation of blocks for an inode

Sandeep K Sinha <sandeepksinha@xxxxxxxxx> · Sat, 28 Mar 2009 23:57:00 +0530

On Sat, Mar 28, 2009 at 3:18 AM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
> Sandeep,
>
> I've looked at the code and made comments.  I suspect the issue is an
> extraneous call
>
>       dst_bhptr = sb_bread(ohsm_sb, dest_bh.b_blocknr);
>
> If that is actually causing a disk read operation of the unitialized
> destination block, it is the culprit.
>

Yes, this is eating up 98% of the total time for relocation of any file.
And surprisingly, the problem is that its equal for source and
destination block.

So, its the same for allocated and un allocated block. I mean the
difference if quite less.

> I would attack that first.  If it proves a false lead, then look at my
> other comments below.
>
Manish is also looking at the same. I am not sure if recently he got
any clues or not.

> Greg
>
> On Thu, Mar 19, 2009 at 8:05 AM, Sandeep K Sinha
> <sandeepksinha@xxxxxxxxx> wrote:
>> Hi Greg,
>>
>> On Sun, Mar 15, 2009 at 9:00 AM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
>>> On Sat, Mar 14, 2009 at 3:37 PM, Vineet Agarwal
>>> <checkout.vineet@xxxxxxxxx> wrote:
>>>> Hello Greg,
>>>>
>>>> During relocation we are copying data block by block..
>>>>
>>> Vineet,
>>>
>>> 1) Be advised that most Linux mailing lists to not like it when you
>>> top post.  Answers should follow the questions.  Look up top posting
>>> at wikipedia if you don't know what I'm talking about.
>>>
>>> 2) Can you add some kprintf through the module such that they only
>>> print once.  Then enable timestamps on the kprintf's and verify where
>>> all the time is going.  It just does not make sense to me that we are
>>> now slower the cp.
>>>
>>> 3) Please post the exact kernel patch you are testing now for the full
>>> block copy and inode update.  I don't want to make assumptions about
>>> how you redid it.
>>>
>>
>> So, we are.
>>
>> I will not be sending the complete patch since it will confuse everyone more.
>> Rather, I have exported some of the ext2 functions. And have written a
>> kernel module to test the time for the re-allocation of blocks for a
>> file.
>>
>> The above code is not aware of any tier information and so. This is
>> just a re-allocation code for ext2. Not even specific to OHSM  but
>> quite specific to ext2 as of now.
>>
>> I am just copying the realloc code here:
>>
>>
>> for (done = 0; done < nblocks; done++) {
>>                memset(&dest_bh, 0, sizeof(struct buffer_head));
>>                memset(&src_bh, 0, sizeof(struct buffer_head));
>>                err = ext2_get_block (src_ind, done, &src_bh, 0);
>
>>                if (err < 0) {
>>                        printk (KERN_DEBUG "\n OHSM error getting blocks ret_val = %d",err);
>>                        goto unlock;
>>                }
>>                if (!buffer_mapped(&src_bh)){
>>                        printk (KERN_DEBUG "\nHOLE ");
>>                        continue;
>>                }
>>
>>                dest_bh.b_state = 0;
>>                err = ext2_get_block (dest_ind, done, &dest_bh, 1);
>
> I think what you have is fine, but ...
>
> Have you looked at the block layout for a file copied via "cp" and one
> done via your patch.  Is the ondisk layout of the blocks used equally
> efficient.  If not, it could cause the slow down.  And the fact that
> you are allocating one block at a time might cause such an inefficient
> layout.
>

cp is doing nothing different. But we are still not completely able to
rule out the possibility that cp is getting some kind of optimization
at VFS layer that we are not.
Recently, manish was suggesting some VFS layer calls that we can
incorporate to get some optimizations.
Manish can you suggest the same logic that you were suggesting last
week. I remember you talking of some VFS layer calls which involved
finding filp in kernel.

>>                if (err < 0) {
>>                        printk (KERN_DEBUG "\n OHSM error getting blocks ret_val = %d",err);
>>                        goto unlock;
>>                }
>>                src_bhptr = sb_bread(ohsm_sb, src_bh.b_blocknr);
>
> Does this allow the read ahead logic to work?  ie. Seems to me
> ext2_get_block may be too low level.
>
> Trouble is I don't know the ext2 and vfs code well enough to know
> where the read ahead logic is implemented.
>
> Does anyone know if sb_bread will leverage readahead?
>

I think NO. We will have to do it in our code itself.
Can someone kindly confirm this ?

>>                if (!src_bhptr)
>>                        goto unlock;
>>                dst_bhptr = sb_bread(ohsm_sb, dest_bh.b_blocknr);
>
> Do you have to do this?  Seems like it might be causing the
> uninitialized block to be read from the physical disk.  If so, this is
> very time consuming.
>

The question is that why the time taken is almost same to initialized blocks ?

>>                if (!dst_bhptr)
>>                        goto unlock;
>>                lock_buffer(dst_bhptr);
>>                memcpy(dst_bhptr->b_data,src_bhptr->b_data,src_bhptr->b_size);
>>                unlock_buffer(dst_bhptr);
>>
>>                mark_buffer_dirty(dst_bhptr);
>>                brelse(src_bhptr);
>>                brelse(dst_bhptr);
>>        }
>>
>>
>>
>> Now, the logs for a 512 MB file being tested. Now,
>>
>> Here for the loop:
>>
>> The loop is taking 119897126320 ticks. Considering loop time as 100%,
>>
>> ext2_sync_inode = 778430
>> memset ( both instances included) = 15102500
>> memcpy = 693354060  = 00.57%
>> source sb_bread = 60658269700 = 50.59%
>> dest sb_bread = 57773094420 = 48.18%
>> Source ext2_get_block = 178310240 = 00.148%
>> Dest ext2_get_block = 391731590
>>
>>
>> The output of the the following command:
>> time ./insmod inum='some_value'
>>
>> real:  1m50.000s
>> user:  0m0.004s
>> sys:   0m19.437s
>>
>> Where as a dd to that same file takes.
>>
>> [/mnt]
>> [17:30:51 sinhas]$ sudo dd if=/mnt/test of=/mnt/test1 count=1000000 (file size)
>> 1000000+0 records in
>> 1000000+0 records out
>> 512000000 bytes (512 MB) copied, 27.6875 s, 18.5 MB/s
>>
>> CP takes:
>>
>> [17:32:28 sinhas]$ sudo time cp ./test ./test2
>> 0.03user 2.13system 0:28.09elapsed 7%CPU (0avgtext+0avgdata 0maxresident)k
>> 1001048inputs+1001184outputs (0major+244minor)pagefaults 0swaps
>> [/mnt]
>> [17:33:13 sinhas]$
>>
>>
>> I have umounted/mounted the FS inbetween all the operations.
>>
>> Looking at the above stats:
>> sb_bread is eating up most of the time, I am looking into it.
>>
>>> Thanks
>>> Greg
>>> --
>>> Greg Freemyer
>>> Head of EDD Tape Extraction and Processing team
>>> Litigation Triage Solutions Specialist
>>> http://www.linkedin.com/in/gregfreemyer
>>> First 99 Days Litigation White Paper -
>>> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>>>
>>> The Norcross Group
>>> The Intersection of Evidence & Technology
>>> http://www.norcrossgroup.com
>>>
>>
>>
>>
>> --
>> Regards,
>> Sandeep.
>>
>>
>>
>>
>>
>>
>> “To learn is to change. Education is a process that changes the learner.”
>>
>
>
>
> --
> Greg Freemyer
> Head of EDD Tape Extraction and Processing team
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> First 99 Days Litigation White Paper -
> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>

-- 
Regards,
Sandeep.

“To learn is to change. Education is a process that changes the learner.”

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ