Re: Copying Data Blocks

Peter Teoh <htmldeveloper@xxxxxxxxx> · Wed, 14 Jan 2009 23:42:12 +0800

>>
>
> Well, Well, Well
> Firstly, the source inode remians intact only the pointers in the
> source inodes are updated.
> And we don't freeze the whole FS, we just take a lock on the inode.
> So, you can operate on all other inodes. We intend to reduce the
> granularity of locking to per block from per inode, sometime in newar
> future for sure.
>

inode level or block level locking is not good for performance, and i
suspect it is also highly susceptible to deadlock scenario.   normally
it should be point in time...like that of Oracle...

first read this email:

https://kerneltrap.org/mailarchive/linux-fsdevel/2008/4/21/1523184/thread

(I elaborated on filesystem level integrity)

then read the direct reply from a Oracle guy to me:

Peter Teoh wrote:

    invalidate the earlier fsck results.   This idea has its equivalence
    in the Oracle database world - "online datafile backup" feature, where
    all transactions goes to memory + journal logs (a physical file
    itself), and datafile is frozen for writing, enabling it to be
    physically copied):

Sunil Mushran from Oracle==>

Actually, no. The dbfile is not frozen. What happens is that the
redo generated in such a way that fractured blocks can be fixed
on restore. You will notice the redo size increase when online
backup is enabled.

> Secondly, for files which are already opened, if it tries to do a
> read/write it sleeps, but doesnt break.
> The period of locking will depend on the file size.
> See, we take a lock, we read the source inode size, allocate required
> number of block in dest inode, and exchange 15 block pointers,release
> the lock, mark source inode dirty, and delete dummy inode.
> As there is no copy of data, the time will not be much. for a 10GB
> file the time for relocation was in seconds.
>
>> If you choose second you might freeze the FS for a long time and if
>> you choose first then how do you plan to handle the below case.
>>
> The cost will be very high, If I freeze the FS. Inode lock saves us to
> some extent here.
>
>
>> a) Application opens a file for writes (remember space checks are done
>> at this place and blocks are preallocated only in memory).
>> b) During relocation and before deletion of your destination inode you
>> are using 2X size of your inode.
>> c) Now if you unfreeze your FS, it might get ENOSPC in this window.
>>

The above is just one possible problem....deadlock are many possible...

But I am amazed by Oracle's strategy....it is really good for performance.

Check this product:

http://www.redbooks.ibm.com/abstracts/redp4065.html?Open

and this one:

http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246786.html?Open

and within the above document:

FlashCopy
The primary objective of FlashCopy is to very quickly create a
point-in-time copy of a source
volume on a target volume. The benefits of FlashCopy are that the
point-in-time target copy is
immediately available for use for backups or testing and that the
source volume is
immediately released so that applications can continue processing with
minimal application
downtime. The target volume can be either a logical or physical copy
of the data, with the
latter copying the data as a background process. In a z/OS
environment, FlashCopy can also
operate at a data set level.

So same features like yours....done point-in-time.

comments?

-- 
Regards,
Peter Teoh

--
To unsubscribe from this list: send an email with
"unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx
Please read the FAQ at http://kernelnewbies.org/FAQ