On Thu, Jan 8, 2009 at 12:55 PM, Sandeep K Sinha <sandeepksinha@xxxxxxxxx> wrote: > Hi Greg, > >> So if I understand your high level desire, you want to write a >> filesystem re-org (or defrag or something) that works one file at a >> time. >> >> You have to do it in the kernel because you want to control the inode >> and data block allocation. >> >> Your current thought is to mount the entire filesystem readonly, do >> the re-org, remount r/w. >> >> If this is just for yourself, it might be acceptable. If this is for >> the community, it is not (IMO). >> >> To be of value to the community, you need to be more aggressive and >> get this to work on a running filesystem. >> >> My first attempt at high-level pseudo code would be: >> >> =========== >> re_org_file() >> { >> read_orig_inode() >> set inode.re_org_in_progress = true >> >> lock(inode.re_org_in_progress) >> allocate destination inode // Do not >> initiate and real i/o >> allocate all destination indirect pointer blocks // Do not initiate >> and real i/o >> allocate all destination data blocks // Do not >> initiate and real i/o >> > > Sorry, but i didn't understand what you mean by "Do not initiate and > real i/o" ?? If you are handling this as a readonly filesystem and thus don't need locking etc., you can ignore that comment. If you are doing the locking, then this is a reminder that it is okay to build your structures, add them to the block layer queues, etc. but you don't want to force actual disk i/o while you have the lock held. Not that it would be the end of the world in this case, but in general you want to avoid doing any disk i/o with locks held. Disk i/o can obviously take a long time and it could interfere with other users getting control of the lock. I believe you should be totally fine as long as you don't use a barrier or similar to force the code to wait until your block updates are on disk. >> allocate_file_re_org_done_array // one bit per data block >> memset (file_re_org_done_array, false) >> release_lock(inode.re_org_in_progress) >> >> for each bit in file_re_org_done_array[] { >> if (not file_re_org_done_array[block]) { >> lock(inode_re_org_in_progress) >> copy_block() // I know, your question is how to do this >> set file_re_org_done_array[block] = true >> release_lock(inode_re_org_in_progress) >> } >> } >> >> lock(inode.re_org_in_progress) >> copy_inode_info() >> update_directory_entries() >> release_lock(inode._re_org_in_progress) >> >> set inode.re_org_in_progress = false >> } >> >> Then insert logic into the write() code that does: >> >> // inserted write logic >> if inode.re_org_in_progress == true { >> lock(inode.re_org_in_progress) >> send data to orig block // no real i/o needed >> send data to new dest block // no real i/o needed > What do you mean by no real i/o needed ??? Same as above. Just don't force the code to pause until the data is actually on disk. >> file_re_org_done_array[block] = true >> release_lock(inode.re_org_in_progress) >> } else >> send data to normal block // no real i/o needed >> >> // end of inserted write logic >> ==================================== >> > > Do we need to copy the blocks, which has been written after the > initiation of this block relocation ? > I mean the copy operation will be redundant. > What do you say ? > If you allow user land writes to the file, then you have 3 highlevel time frames to consider: 1) Pre re-org 2) re-org in progress 3) post re-org 1) During pre re-org you only have to update the original file obviously. 2) During post re-org if all went well, you only have to update the new re-organized file. SHIT, I totally forgot about any user space applications that have the old file open. You will need to ensure they read / write from the new re-org file after all is said and done. Any lurkers out there have any ideas how to do that? For your first effort, if you decide to do the read-only mount, you don't have to address this issue. But if you are working with a live file system, you will. 3) During the re-org itself, you have to keep the original file 100% valid so writes have to go to it. And you also have to update the new re-organized file or will not have the update. I am setting the done flag for any blocks updated by user space initiated writes just to reduce duplicate effort. ie. If the main duplicate loop is at block 100 and user space write causes block 150 to updated in both the original and in the new re-organized file, then there is no reason for the main duplicate loop to have to copy that data block again. Greg -- Greg Freemyer Litigation Triage Solutions Specialist http://www.linkedin.com/in/gregfreemyer First 99 Days Litigation White Paper - http://www.norcrossgroup.com/forms/whitepapers/99%20Days%20whitepaper.pdf The Norcross Group The Intersection of Evidence & Technology http://www.norcrossgroup.com -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ