On Mon, Jan 12, 2009 at 11:48 PM, Manish Katiyar <mkatiyar@xxxxxxxxx> wrote: > On Mon, Jan 12, 2009 at 11:31 PM, Sandeep K Sinha > <sandeepksinha@xxxxxxxxx> wrote: >> Hi Peter, >> >> On Mon, Jan 12, 2009 at 9:49 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: >>> On Mon, Jan 12, 2009 at 4:26 PM, Sandeep K Sinha >>> <sandeepksinha@xxxxxxxxx> wrote: >>>> Hi Peter, >>>> >>>> Don't you think that if will restrict this to a specific file system. >>>> VFS inode should be used rather than the FS incore inode ? >>>> >>> >>> vfs have an API: fsync_buffer_list(), and >>> invalidate_inode_buffers(), and these API seemed to used spinlock for >>> syncing: >>> >>> void invalidate_inode_buffers(struct inode *inode) >>> { >>> if (inode_has_buffers(inode)) { >>> struct address_space *mapping = &inode->i_data; >>> struct list_head *list = &mapping->private_list; >>> struct address_space *buffer_mapping = mapping->assoc_mapping; >>> >>> spin_lock(&buffer_mapping->private_lock); >>> while (!list_empty(list)) >>> >>> __remove_assoc_queue(BH_ENTRY(list->next));======> modify this for >>> writing out the data instead. >>> spin_unlock(&buffer_mapping->private_lock); >>> } >>> } >>> EXPORT_SYMBOL(invalidate_inode_buffers); >>> >>> >>>> The purpose if to sleep all the i/o's when we are updating the i_data >>>> from the new inode to the old inode ( updation of the data blocks ). >>>> >>>> I think i_alloc_sem should work here, but could not find any instance >>>> of its use in the code. >>> >>> for the case of ext3's blcok allocation, the lock seemed to be >>> truncate_mutex - read the remark: >>> >>> /* >>> * From here we block out all ext3_get_block() callers who want to >>> * modify the block allocation tree. >>> */ >>> mutex_lock(&ei->truncate_mutex); >>> >>> So while it is building the tree, the mutex will lock it. >>> >>> And the remarks for ext3_get_blocks_handle() are: >>> >>> /* >>> * Allocation strategy is simple: if we have to allocate something, we will >>> * have to go the whole way to leaf. So let's do it before attaching anything >>> * to tree, set linkage between the newborn blocks, write them if sync is >>> * required, recheck the path, free and repeat if check fails, otherwise >>> * set the last missing link (that will protect us from any truncate-generated >>> ... >>> >>> reading the source....go down and see the mutex_lock() (where >>> multiblock allocation are needed) and after the lock, all the blocks >>> allocation/merging etc are done: >>> >>> /* Next simple case - plain lookup or failed read of indirect block */ >>> if (!create || err == -EIO) >>> goto cleanup; >>> >>> mutex_lock(&ei->truncate_mutex); >>> <snip> >>> count = ext3_blks_to_allocate(partial, indirect_blks, >>> maxblocks, blocks_to_boundary); >>> <snip> >>> err = ext3_alloc_branch(handle, inode, indirect_blks, &count, goal, >>> offsets + (partial - chain), partial); >>> >>> >>>> It's working fine currently with i_mutex, meaning if we hold a i_mutex >>> >>> as far as i know, i_mutex are used for modifying inode's structures information: >>> >>> grep for i_mutex in fs/ext3/ioctl.c and everytime there is a need to >>> maintain inode's structural info, the lock on i_mutex is called. >>> >>>> lock on the inode while updating the i_data pointers. >>>> And try to perform i/o from user space, they are queued. The file was >>>> opened in r/w mode prior to taking the lock inside the kernel. >>>> >>>> But, I still feel i_alloc_sem would be the right option to go ahead with. >>>> >>>> On Mon, Jan 12, 2009 at 1:11 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: >>>>> If u grep for spinlock, mutex, or "sem" in the fs/ext4 directory, u >>>>> can find all three types of lock are used - for different class of >>>>> object. >>>>> >>>>> For data blocks I guessed is semaphore - read this >>>>> fs/ext4/inode.c:ext4_get_branch(): >>>>> >>>>> /** >>>>> * ext4_get_branch - read the chain of indirect blocks leading to data >>>>> <snip> >>>>> * >>>>> * Need to be called with >>>>> * down_read(&EXT4_I(inode)->i_data_sem) >>>>> */ >>>>> >>>>> i guess u have no choice, as it is semaphore, have to follow the rest >>>>> of kernel for consistency - don't create your own semaphore :-). >>>>> >>>>> There exists i_lock as spinlock - which so far i know is for i_blocks >>>>> counting purposes: >>>>> >>>>> spin_lock(&inode->i_lock); >>>>> inode->i_blocks += tmp_inode->i_blocks; >>>>> spin_unlock(&inode->i_lock); >>>>> up_write(&EXT4_I(inode)->i_data_sem); >>>>> >>>>> But for data it should be i_data_sem. Is that correct? >>>>> >>>>> On Mon, Jan 12, 2009 at 2:18 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote: >>>>>> Hi, >>>>>> >>>>>> I am having some issues in locking inode while copying data blocks. >>>>>> We are trying to keep file system live during this operation, so >>>>>> both read and write operations should work. >>>>>> In this case what type of lock on inode should be used, semaphore, >>>>>> mutex or spinlock? >>>>>> >>>>>> >>>>>> On Sun, Jan 11, 2009 at 8:45 PM, Peter Teoh <htmldeveloper@xxxxxxxxx> wrote: >>>>>>> Sorry.....some mistakes...a resent: >>>>>>> >>>>>>> Here are some tips on the blockdevice API: >>>>>>> >>>>>>> http://lkml.org/lkml/2006/1/24/287 >>>>>>> http://linux.derkeiler.com/Mailing-Lists/Kernel/2006-01/msg09388.html >>>>>>> >>>>>>> as indicated, documentation is rather sparse in this area. >>>>>>> >>>>>>> not sure if anyone else have a summary list of blockdevice API and its >>>>>>> explanation? >>>>>>> >>>>>>> not wrt the following "cleanup patch", i am not sure how the API will change: >>>>>>> >>>>>>> http://lwn.net/Articles/304485/ >>>>>>> >>>>>>> thanks. >>>>>>> >>>>>>> On Tue, Jan 6, 2009 at 6:36 PM, Rohit Sharma <imreckless@xxxxxxxxx> wrote: >>>>>>>> >>>>>>>> I want to read data blocks from one inode >>>>>>>> and copy it to other inode. >>>>>>>> >>>>>>>> I mean to copy data from data blocks associated with one inode >>>>>>>> to the data blocks associated with other inode. >>>>>>>> >>>>>>>> Is that possible in kernel space.? >>>>>>>> -- >>>>> >>> >>> comments ???? >> >> Thats very right !!! >> >> So, finally we were able to perform the copy operation successfully. >> >> We did something like this and we named it "ohsm's tricky copy". >> Rohit will soon be uploading a new doc soon on the fscops page which >> will detail it further. > Hi Manish, > Thanks let us know when the docs and the *source code* is available ;-) We have been working on other modules of OHSM, most of the modules have been coded for ohsm ( the framework is ready ). Once we finish with this, i'll mail you the doc. > >> >> 1. Read the source inode. >> 2. Allocate a new ghost inode. >> 3. Take a lock on the source inode. /* mutex , because the nr_blocks >> can change if write comes now from user space */ >> 4. Read the number of blocks. >>> >> 5. Allocate the same number of blocks for the dummy ghost inode. /* >> the chain will be created automatically */ >> 6. Read the source buffer head of the blocks from source inode and >> destination buffer head of the blocks of the destination inode. >> >> 7. dest_buffer->b_data = source_buffer->b_data ; /* its a char * and >> this is where the trick is */ >> 8. mark the destination buffer dirty. >> >> perform 6,7,8 for all the blocks. >> >> 9. swap the src_inode->i_data[15] and dest_dummy_inode->i_data[15]; /* >> This helps us to simply avoid copying the block number back from >> destination dummy inode to source inode */ > > I don't know anything about LVM, so this might be a dumb question. Why > is this required ? LVM is same as mvfs, it allows a FS to span over multiple volumes. > Did you mean swapping all the block numbers rather > than just the [15] ?? Yes we are swapping block numbers between src_inode and dest_inode and finally src_inode is updated with new data blocks. >Here src_inode is the vfs "struct inode" or the > FS specific struct FS_inode_info ??? i didn't get this completely, > can you explain this point a bit more. Here src_inode is VFS inode, with this we can get the FS_inode_info structure. for eg. we can get ext2_inode_info from vfs inode like this. ext2_inode = EXT2_I(vfs_inode) > > Thanks - > Manish > > >> /* This also helps to simply destroy the inode, which will eventually >> free all the blocks, which otherwise we would have been doing >> separately */ >> >> 9.1 Release the mutex on the src inode. >> >> 10. set the bit for I_FREEING in dest_inode->i_state. >> >> '11. call FS_delete_inode(dest_inode); >> >> Any application which is already opened this inode for read/write, >> tries to do read/write when the mutex lock is taken, it will be >> queued. >> >>> >> >> Thanks a lot Greg,Manish, Peter and all others for all your valuable >> inputs and help. >> >>> -- >>> Regards, >>> Peter Teoh >>> >> >> -- >> Regards, >> Sandeep. >> >> >> >> >> >> "To learn is to change. Education is a process that changes the learner." >> >> -- >> To unsubscribe from this list: send an email with >> "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx >> Please read the FAQ at http://kernelnewbies.org/FAQ >> >> > -- To unsubscribe from this list: send an email with "unsubscribe kernelnewbies" to ecartis@xxxxxxxxxxxx Please read the FAQ at http://kernelnewbies.org/FAQ