Mikulas, Thank you for your reviewing. I will reply to polling issue first. For the rest, I will reply later. > Polling for state > ----------------- > > Some of the kernel threads that you spawn poll for data in one-second > interval - see migrate_proc, modulator_proc, recorder_proc, sync_proc. > > flush_proc correctly contains wait_event, but there is still some 100ms > polling timeout in flush_proc that shouldn't be there. > > If you set the polling interval too low, it wastes CPU cycle and it wastes > energy due to CPU wake-ups. If you set the polling interval too high, it > causes artifical delays. For example, current middle-class SSDs can do > writes at a rate 500MB/s. Now, think that the user uses 400MB partition > for the cache - the partition is filled up in 0.8s and the kernel waits > for additional 0.2s doing absolutely nothing, until your polling code > wakes up and notices that it should start flusing the cache. > > So - the polling code introduces artifical delays that can cause > performance degradation. You may think about lowering the polling interval > to lessen the performance impact, but if you lower the polling interval, > it increases CPU and power consumption when the driver is idle. Either > way, it is wrong. Just don't use polling. You dislike the fact that looping thread sleeping can delay to the newly coming jobs. I am afraid your understanding on the sleeping of flush daemon is wrong. Please see queue_flushing() it wakes up the daemon if the queue is empty. spin_lock_irqsave(&cache->flush_queue_lock, flags); empty = list_empty(&cache->flush_queue); list_add_tail(&job->flush_queue, &cache->flush_queue); spin_unlock_irqrestore(&cache->flush_queue_lock, flags); if (empty) wake_up_interruptible(&cache->flush_wait_queue); Even if this waking up lines are removed the flush daemon wakes up by its self 100ms at worst after the first 1MB written. > Don't do this. You can do polling in a piece of code that is executed > infrequently, such as driver unload, but general functionality must not > depend on polling. 100ms is the time the user must wait in termination as your claim. However, I think this is a good time to explain why I didn't choose to use workqueue. Keeping it a polling thread that consumes what in queue instead of replacing it with workqueue has these reasons: 1. CPU overhead is lower queue_work overhead everytime flush job is to queue is crucial for writeboost. queue_work is too CPU-consuming as far as I looked at the code. I have measured the theoretical maximum 4KB randwrite throughput of writeboost using fio benchmark. It was 1.5GB/sec with fast enough cache device. I don't think this extraordinary score can not be achieved with other caching softwares. In this benchmark, the flush daemon didn't sleep at all because the writes are ceaseless. Using workqueue will not only lose throughput but also wastes CPU cycles both regularly is the fact I dislike. 2. Ease of Error Handling Keeping it a single thread looping and self-managing the queue makes it easy to handle error. If we choose to use workqueue, we need a mechanism to * notify the occurrence of a error to other work items * play-back the work items in the same order since waiting until the system recovers for long time is prohibited with workqueue as we discussed in the previous posts. Segments must be flushed and then migrated in sequential order along its id. Leaving the control of the queue which should be in desired order to the workqueue subsystem is dangerous in this case. For migrate_proc, I see no reason this daemon polling is bad although it actually sleeps in leisure time. In using writeboost, we can obviously assume that the sequential write throughput of cache device is much higher than the random write throughput of the backing store so migrate daemon is unlikely to sleep since there are many dirty segments to migrate is the ordinary situation. Furthermore, since migration daemon is more one block off from the user (upper layer) than flush daemon the one second sleeping problem gets weaker or diluted. If you still dislike the delay of this daemon I think of forcefully waking up the daemon if it is sleeping like flush daemon. > First, you must design some event model - (a bunch of threads polling in 1 > second interval doesn't seem viable). For example, you can use workqueues > correctly this way: > > * You create a workqueue for migration, but you don't submit any work to > it. > * If you fill the cache device above the watermark, you submit a work item > to the migration workqueue (you can submit the same work item to the > same workqueue multiple times - if the work item is still queued and > hasn't started executing, the second submit is ignored; if the work item > has started executing, the second submit causes that it is executed once > more). > * The work item does a little bit of work, it finds data to be migrated, > submits the bios to do the migration and exits. (you can use dm-kcopyd > to do actual migration, it simplifies your code) > * If you need more migration, you just submit the work item again. > > If you design it this way, you avoid the polling timers (migration starts > as soon as it is needed) and also you avoid the problem with the looping > workqueue. The reason 2 in why I don't choose workqueue can be reused in designing migration daemon and I believe keeping it a single thread looping is the best if we have to care error handling. dm-kcopyd seems to be a good choice for simplifying the code. I remember that I discarded this option for some reason before but I don't remember the reason. I am thinking of reconsidering this issue and maybe I can eventually reach to using kcopyd because the design is changed a lot since that time. > to lessen the performance impact, but if you lower the polling interval, > it increases CPU and power consumption when the driver is idle. Either > way, it is wrong. Just don't use polling. One last thing. The power consumption in idle time is the only shortcoming of polling. But who cares the power consumption of waking up once a second and checking a queue is empty? Akira -- dm-devel mailing list dm-devel@xxxxxxxxxx https://www.redhat.com/mailman/listinfo/dm-devel