Hey Greg, On Thu, Jan 8, 2009 at 10:14 AM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote: > On Wed, Jan 7, 2009 at 11:05 PM, Sandeep K Sinha > <sandeepksinha@xxxxxxxxx> wrote: >> Hi Greg, >> >> Just to give you a context of the problem : >> refer: >> http://code.google.com/p/fscops/ >> >> reply inline. >> >> On Wed, Jan 7, 2009 at 9:03 PM, Greg Freemyer <greg.freemyer@xxxxxxxxx> wrote: >>> On Wed, Jan 7, 2009 at 5:48 AM, Rohit Sharma <imreckless@xxxxxxxxx> wrote: >>>> On Wed, Jan 7, 2009 at 12:44 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote: >>>>> On Wed, Jan 7, 2009 at 12:17 PM, Sandeep K Sinha >>>>> <sandeepksinha@xxxxxxxxx> wrote: >>>>>> Ok, Let me rephrase what rohit is exactly trying to question. >>>>>> >>>>>> There is an inode X which has say some N number of data blocks. >>>>>> Now, through his own kernel module and some changes to the file system, >>>>>> he wants to create a new inode Y in the FS and physically copy all the >>>>>> data from the old inode to the new inode. >>>>> >>>>> Errr....... I must be missing something...... For this why do you need >>>>> to copy the data blocks ? if you just copy the old inode to new inode, >>>>> you have already copied the direct and indirect block pointers right ? >>>>> That will not take much time, and now if you free the old inode, you >>>>> have virtually changed the ownership of old blocks to the new inode. >>>>> >>>> The problem is not replacing the inode, i want to physically move the data. >>>> That means if inode X and its data blocks are in block group 1, and >>>> new inode is in block group 100 >>>> then i will allocate data blocks in block group 100 and copy the data >>>> from inode X to inode Y. >>>> So i will be able to physically relocate a file, and change the >>>> directory entry to contain inode Y. >>>> >>>>> The problems i can see with this approach is that if the new inode is >>>>> not in the same block group as old inode, you have *kind of broken* >>>>> the ext2's intelligence of allocating the blocks in the same block >>>>> group. >>>>> >>>>> CMIIW . btw this thread is interesting :-) >>>> >>>> Yes its interesting. :-) >>>> >>>>> >>>> I haven't actually broken ext2's intelligence completely, i have only put >>>> restrictions in allocation of inode and data blocks. >>>> And it works fine with existing optimizations. >>>> >>>> And the major issue is relocating files between different block group range. >>> >>> So if I understand your high level desire, you want to write a >>> filesystem re-org (or defrag or something) that works one file at a >>> time. >>> >> Yes kind of, you can say that. >> >>> You have to do it in the kernel because you want to control the inode >>> and data block allocation. >>> >> >> Because I want to keep control the allocation of data blocks of a file >> to a specific device underneath a LVM. And so the mapping of blocks >> from FS->LVM->DEVICES resides inside the kernel only. >> >>> Your current thought is to mount the entire filesystem readonly, do >>> the re-org, remount r/w. >>> >> Well, this should work for now but ya we will look for an alternative >> for sure. something like freeze/thaw. >> >>> If this is just for yourself, it might be acceptable. If this is for >>> the community, it is not (IMO). >>> >> Surely this is for our personal use. >> >>> To be of value to the community, you need to be more aggressive and >>> get this to work on a running filesystem. >>> >> >> Surely be a milestone, soon. >> >>> My first attempt at high-level pseudo code would be: >>> >>> =========== >>> re_org_file() >>> { >>> read_orig_inode() >>> set inode.re_org_in_progress = true >>> >>> lock(inode.re_org_in_progress) >>> allocate destination inode // Do not >>> initiate and real i/o >>> allocate all destination indirect pointer blocks // Do not initiate >>> and real i/o >>> allocate all destination data blocks // Do not >>> initiate and real i/o >>> >>> allocate_file_re_org_done_array // one bit per data block >>> memset (file_re_org_done_array, false) >>> release_lock(inode.re_org_in_progress) >>> >>> for each bit in file_re_org_done_array[] { >>> if (not file_re_org_done_array[block]) { >>> lock(inode_re_org_in_progress) >>> copy_block() // I know, your question is how to do this >>> set file_re_org_done_array[block] = true >>> release_lock(inode_re_org_in_progress) >>> } >>> } >>> >>> lock(inode.re_org_in_progress) >>> copy_inode_info() >> >> Well, currently I dont intend to move the inode to a new location. I >> would prefer leave the original inode intact just updating the new >> data block pointers. This is still in debate, whether to relocate >> inode or not. >> >>> update_directory_entries() >>> release_lock(inode._re_org_in_progress) >>> >>> set inode.re_org_in_progress = false >>> } >>> >>> Then insert logic into the write() code that does: >>> >>> // inserted write logic >>> if inode.re_org_in_progress == true { >>> lock(inode.re_org_in_progress) >>> send data to orig block // no real i/o needed >>> send data to new dest block // no real i/o needed >>> file_re_org_done_array[block] = true >>> release_lock(inode.re_org_in_progress) >>> } else >>> send data to normal block // no real i/o needed >>> >>> // end of inserted write logic >> >> What exactly is this required for ? Is this for any kind of metadata updates ? > > I assumed you needed to effectively "copy the file to a new > destination, then delete the original file". > > To do that on a live file with minimal interference with user space > invoked i/o I created a ghost version which initially had empty data > blocks. > > Then I allowed normal file i/o to continue. Look at the first chunk > of code and see where I released the lock. Whenever the lock is > released normal user space file i/o is allowed to occur. > > Reads are easily handled by reading from the original file. > > Writes on the other hand have to update both the original file data > blocks and the newly allocated data blocks. > > And as I look again at the pseudo code, I forgot to do the file delete > of the original inode at the end. > The overall idea really makes good sense. But will be a bit complicated to handle this, not sure. This is something similar to handle a failure in a mirrored environment. > Greg > -- > Greg Freemyer > Litigation Triage Solutions Specialist > http://www.linkedin.com/in/gregfreemyer > First 99 Days Litigation White Paper - > http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf > > The Norcross Group > The Intersection of Evidence & Technology > http://www.norcrossgroup.com > -- Regards, Sandeep. "To learn is to change. Education is a process that changes the learner." -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ