>> > > Well, Well, Well > Firstly, the source inode remians intact only the pointers in the > source inodes are updated. > And we don't freeze the whole FS, we just take a lock on the inode. > So, you can operate on all other inodes. We intend to reduce the > granularity of locking to per block from per inode, sometime in newar > future for sure. > inode level or block level locking is not good for performance, and i suspect it is also highly susceptible to deadlock scenario. normally it should be point in time...like that of Oracle... first read this email: https://kerneltrap.org/mailarchive/linux-fsdevel/2008/4/21/1523184/thread (I elaborated on filesystem level integrity) then read the direct reply from a Oracle guy to me: Peter Teoh wrote: invalidate the earlier fsck results. This idea has its equivalence in the Oracle database world - "online datafile backup" feature, where all transactions goes to memory + journal logs (a physical file itself), and datafile is frozen for writing, enabling it to be physically copied): Sunil Mushran from Oracle==> Actually, no. The dbfile is not frozen. What happens is that the redo generated in such a way that fractured blocks can be fixed on restore. You will notice the redo size increase when online backup is enabled. > Secondly, for files which are already opened, if it tries to do a > read/write it sleeps, but doesnt break. > The period of locking will depend on the file size. > See, we take a lock, we read the source inode size, allocate required > number of block in dest inode, and exchange 15 block pointers,release > the lock, mark source inode dirty, and delete dummy inode. > As there is no copy of data, the time will not be much. for a 10GB > file the time for relocation was in seconds. > >> If you choose second you might freeze the FS for a long time and if >> you choose first then how do you plan to handle the below case. >> > The cost will be very high, If I freeze the FS. Inode lock saves us to > some extent here. > > >> a) Application opens a file for writes (remember space checks are done >> at this place and blocks are preallocated only in memory). >> b) During relocation and before deletion of your destination inode you >> are using 2X size of your inode. >> c) Now if you unfreeze your FS, it might get ENOSPC in this window. >> The above is just one possible problem....deadlock are many possible... But I am amazed by Oracle's strategy....it is really good for performance. Check this product: http://www.redbooks.ibm.com/abstracts/redp4065.html?Open and this one: http://www.redbooks.ibm.com/Redbooks.nsf/RedbookAbstracts/sg246786.html?Open and within the above document: FlashCopy The primary objective of FlashCopy is to very quickly create a point-in-time copy of a source volume on a target volume. The benefits of FlashCopy are that the point-in-time target copy is immediately available for use for backups or testing and that the source volume is immediately released so that applications can continue processing with minimal application downtime. The target volume can be either a logical or physical copy of the data, with the latter copying the data as a background process. In a z/OS environment, FlashCopy can also operate at a data set level. So same features like yours....done point-in-time. comments? -- Regards, Peter Teoh -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ