Re: Copying Data Blocks

"Sandeep K Sinha" <sandeepksinha@xxxxxxxxx> · Thu, 15 Jan 2009 23:17:11 +0530

Hey,

On Thu, Jan 15, 2009 at 10:27 PM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
> On Thu, Jan 15, 2009 at 10:41 AM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote:
>> On Thu, Jan 15, 2009 at 10:49 PM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote:
>>>
>>> I dont' think the above paragraph is an issue with re-org as currently
>>> designed.  Neither for the ext4_defrag patchset that is under
>>> consideration for acceptance, nor the work the OHSM team is doing.
>>>
>>
>> well...it boils down to probability....the lower level the locks, the
>> more complex it gets....and Nick Piggin echoed this, to quote from
>> article:
>>
>> http://lwn.net/Articles/275185/:  (Toward better direct I/O scalability)
>>
>> "There are two common approaches to take when faced with this sort of
>> scalability problem. One is to go with more fine-grained locking,
>> where each lock covers a smaller part of the kernel. Splitting up
>> locks has been happening since the initial creation of the Big Kernel
>> Lock, which is the definitive example of coarse-grained locking. There
>> are limits to how much fine-grained locking can help, though, and the
>> addition of more locks comes at the cost of more complexity and more
>> opportunities to create deadlocks. "
>>
>>>
>>> Especially with rotational media, the call stack at the filesystem
>>
>> be aware of SSD....and they are coming down very fast in terms of
>> cost.   right now....IBM is testing 4TB SSD.......discussed in a
>> separate thread.   (not really sure about properties of SSD....but I
>> think physical contiguity of data may not matter any more, as there
>> are no moving heads to read the data?)
>
> I'm very aware of SSD.  I've been actively researching it for the last
> week or so.  That is why I was careful to say rotation media is slow.
>
> Third generation SSD is spec'ing it random i/o speed and its
> sequential i/o speed separately.
>
> The first couple generations tended to only spec. sequential because
> random was so bad they did not want to advertise it.
>
>>> layer is just so much faster than the drive, that blocking access to
>>> the write queue for a few milliseconds while some block level re-org
>>
>> how about doing it in-memory?  ie, reading the inode blocks (which can
>> be scattered all over the place) into memory as a contiguous chunk.
>> then allocate the inodes sequence...physically contiguously....and
>> then write to it in sequence.   so there exists COPY + PHYSICAL-REORG
>> at the same time.....partly through memory?   so while this is
>> happening, and the source blocks got modified....then the memory for
>> destination blocks will be updated immediately....no time delay.
>>
> Doing it in memory is what I think the goal should be.
>
> I don't think the ext4_defrag patchset accomplishes that, but maybe
> I'm missing something.
>
> I think I've said it before, but I would think the best real world
> implementation would be:
>
> ===
> pre-allocate destination data blocks
>
> For each block
>  prefetch source data block
>  lock inode
>  copy source data block to dest data block IN MEMORY ONLY and put in
> block queue for delivery to disk
>  release lock
> end
>
> perform_inode_level_block_pointer_swap
> ===
>

I would be more than very happy if I am able to accomplish this. Greg,
the only problem that I see here is somebody who has already opened
the file is making the size of the file to increase, once I
preallocate destination data blocks.
And I don;t see a way to avoid that. But surely looking forward to.

I have seen many similar implementations and most of them suffer from
this issue. But surely there can be a way to optimize it, if not avoid
it.

> thus the lock is only held long enough to perform a memory copy of one block.
>
Well, as pointed out by Manish, we are not even copying the data to
the destination block.
In the source buffer, there is a (char*) which points to the actual
data of the buffer, we set the pointer of the destination buffer to
that of source. And marking it dirty.
So, any writes on the new block will sync the block on its own.

>
>>> Not to be snide, but if you truly feel a design that does use inode
>>> locking to get the job done is unacceptable, then you should post your
>>> objections on the ext4 list.
>>
>> sorry.....I am just a newbie....and I enjoy discussing all these with
>> those at my level.....for the ext4 list? well....they already know
>> that - and I quote from the same article above:
>>
>> http://lwn.net/Articles/275185/
>>
>> "The other approach is to do away with locking altogether; this has
>> been the preferred way of improving scalability in recent years. That
>> is, for example, what all of the work around read-copy-update has been
>> doing. And this is the direction Nick has chosen to improve
>> get_user_pages()."
>>
>> I will discuss in the list if i can understand 80% to 90% of this
>> article, which is still far from true :-(.
>>
>> Thanks.....
>>
>> --
>> Regards,
>> Peter Teoh
>
> Good Luck and I'm glad your enjoying the discussion.
> Personally, I'm just very excited about the idea of a HSM in Linux
> that will allow SSDs to be more highly leveraged in a tiered storage
> environment.  As a linux user I think that is one of the most
> interesting things I've seen discussed in a while.
>
> Greg
> --
> Greg Freemyer
> Litigation Triage Solutions Specialist
> http://www.linkedin.com/in/gregfreemyer
> First 99 Days Litigation White Paper -
> http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf
>
> The Norcross Group
> The Intersection of Evidence & Technology
> http://www.norcrossgroup.com
>

-- 
Regards,
Sandeep.

"To learn is to change. Education is a process that changes the learner."

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ