On Sat, 14 Mar 2015 13:24:25 +0100, Andreas Rohner wrote: > Hi Ryusuke, > > Thank you very much for your detailed review and feedback. I agree with > all of your points and I will start working on a rewrite immediately. > > On 2015-03-12 13:54, Ryusuke Konishi wrote: >> Hi Andreas, >> >> On Tue, 10 Mar 2015 21:37:50 +0100, Andreas Rohner wrote: >>> Hi Ryusuke, >>> >>> Thanks for your thorough review. >>> >>> On 2015-03-10 06:21, Ryusuke Konishi wrote: >>>> Hi Andreas, >>>> >>>> I looked through whole kernel patches and a part of util patches. >>>> Overall comments are as follows: >>>> >>>> [Algorithm] >>>> As for algorithm, it looks about OK except for the starvation >>>> countermeasure. The stavation countermeasure looks adhoc/hacky, but >>>> it's good that it doesn't change kernel/userland interface; we may be >>>> able to replace it with better ways in a future or in a revised >>>> version of this patchset. >>>> >>>> (1) Drawback of the starvation countermeasure >>>> The patch 9/9 looks to make the execution time of chcp operation >>>> worse since it will scan through sufile to modify live block >>>> counters. How much does it prolong the execution time ? >>> >>> I'll do some tests, but I haven't noticed any significant performance >>> drop. The GC basically does the same thing, every time it selects >>> segments to reclaim. >> >> GC is performed in background by an independent process. What I'm >> care about it that NILFS_IOCTL_CHANGE_CPMODE ioctl is called from >> command line interface or application. They differ in this meaning. >> >> Was a worse case senario considered in the test ? >> >> For example: >> 1. Fill a TB class drive with data file(s), and make a snapshot on it. >> 2. Run one pass GC to update snapshot block counts. >> 3. And do "chcp cp" >> >> If we don't observe noticeable delay on this class of drive, then I >> think we can put the problem off. > > Yesterday I did a worst case test as you suggested. I used an old 1 TB > hard drive I had lying around. This was my setup: > > 1. Write a 850GB file > 2. Create a snapshot > 3. Delete the file > 4. Let GC run through all segments > 5. Verify with lssu that the GC has updated all SUFILE entries > 6. Drop the page cache > 7. chcp cp > > The following results are with the page cache dropped immediately before > each call: > > 1. chcp ss > real 0m1.337s > user 0m0.017s > sys 0m0.030s > > 2. chcp cp > real 0m6.377s > user 0m0.023s > sys 0m0.053s > > The following results are without the drop of the page cache: > > 1. chcp ss > real 0m0.137s > user 0m0.010s > sys 0m0.000s > > 2. chcp cp > real 0m0.016s > user 0m0.010s > sys 0m0.007s > > There are 119233 segments in my test. Each SUFILE entry uses 32 bytes. > So the worst case for 1 TB with 8 MB segments would be 3.57 MB of random > reads and one 3.57 MB continuous write. You only get 6.377s because my > hard drive is so slow. You wouldn't notice any difference on a modern > SSD. Furthermore the SUFILE is also scanned by the segment allocation > algorithm and the GC, so it is very likely already in the page cache. 6.377s is too long because nilfs_sufile_fix_starving_segs() locks sufile mi_sem, and even lengthens lock period of the following locks: - cpfile mi_sem (held at nilfs_cpfile_clear_snapshot()). - transaction lock (held at nilfs_ioctl_change_cpmode()). - ns_snapshot_mount_mutex (held at nilfs_ioctl_change_cpmode()). leading to freeze of all write operations, lssu, lscp, cleanerd, and snapshot mount, etc. It is preferable for the function to be moved outside of them and to release/reacquire transaction lock and sufile mi_sem regularly in some way. Regards, Ryusuke Konishi -- To unsubscribe from this list: send the line "unsubscribe linux-nilfs" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html