Xiao Guangrong <guangrong.xiao@xxxxxxxxx> wrote: > On 09/04/2018 12:38 AM, Juan Quintela wrote: >> guangrong.xiao@xxxxxxxxx wrote: >>> From: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxx> >>> >>> flush_compressed_data() needs to wait all compression threads to >>> finish their work, after that all threads are free until the >>> migration feeds new request to them, reducing its call can improve >>> the throughput and use CPU resource more effectively >>> We do not need to flush all threads at the end of iteration, the >>> data can be kept locally until the memory block is changed or >>> memory migration starts over in that case we will meet a dirtied >>> page which may still exists in compression threads's ring >>> >>> Signed-off-by: Xiao Guangrong <xiaoguangrong@xxxxxxxxxxx> >> >> I am not so sure about this patch. >> >> Right now, we warantee that after each iteration, all data is written >> before we start a new round. >> >> This patch changes to only "flush" the compression threads if we have >> "synched" with the kvm migration bitmap. Idea is good but as far as I >> can see: >> >> - we already call flush_compressed_data() inside firnd_dirty_block if we >> synchronize the bitmap. > > The one called in find_dirty_block is as followings: > > if (migrate_use_xbzrle()) { > /* If xbzrle is on, stop using the data compression at this > * point. In theory, xbzrle can do better than compression. > */ > flush_compressed_data(rs); > } Ouch. I didn't notice thet use_xbzrle() there. I think that the proper solution is just to call there unconditionally flush_compressed_data(). And then remove the call that I pointed at the end, no? > We will call it only if xbzrle is also enabled, at this case, we will > disable compression and xbzrle for the following pages, please refer > to save_page_use_compression(). So, it can not help us if we just enabled > compression separately. Point taken, but I still think that it should be unconditional. > >> So, at least, we need to update >> dirty_sync_count there. > > I understand this is a optimization that reduces flush_compressed_data > under the if both xbzrle and compression are both enabled. However, we > can avoid it by using save_page_use_compression() to replace > migrate_use_compression(). > > Furthermore, in our work which will be pushed it out after this patchset > (https://marc.info/?l=kvm&m=152810616708459&w=2), we will made > flush_compressed_data() really light if there is nothing to be flushed. > > So, how about just keep it at this patch and let's optimize it in our > next work? :) I still think that it is not needed, and that we can do better just removing the migrate_use_xbzrle() test and just removing the one in ram_save_iterate, but we can optimize it later. >> - queued pages are "interesting", but I am not sure if compression and >> postcopy work well together. >> > > Compression and postcopy can not both be enabled, please refer to the > code in migrate_caps_check() > > if (cap_list[MIGRATION_CAPABILITY_POSTCOPY_RAM]) { > if (cap_list[MIGRATION_CAPABILITY_COMPRESS]) { > /* The decompression threads asynchronously write into RAM > * rather than use the atomic copies needed to avoid > * userfaulting. It should be possible to fix the decompression > * threads for compatibility in future. > */ > error_setg(errp, "Postcopy is not currently compatible " > "with compression"); > return false; > } Ok. >> So, if we don't need to call flush_compressed_data() every round, then >> the one inside find_dirty_block() should be enough. Otherwise, I can't >> see why we need this other. >> >> >> Independent of this patch: >> - We always send data for every compression thread without testing if >> there is any there. >> > > Yes. That's because we will never doubly handle the page in the iteration. :) I mean something different here. Think about the case that we have 4 compression threads and only three pages left to send. We have to call flush_compressed_data() and it sends data for the four threads, even when one of them is empty. >>> @@ -3212,7 +3225,6 @@ static int ram_save_iterate(QEMUFile *f, void *opaque) >>> } >>> i++; >>> } >>> - flush_compressed_data(rs); >>> rcu_read_unlock(); >>> /* >> >> Why is not enough just to remove this call to flush_compressed_data? > > Consider this case, thanks Dave to point it out. :) > > =============== > > iteration one: > thread 1: Save compressed page 'n' > > iteration two: > thread 2: Save compressed page 'n' > > What guarantees that the version of page 'n' > from thread 2 reaches the destination first without > this flush? > =============== > > If we just remove it, we can not get this guarantee. :) Ok. I think that the propper fix is to remove the things that you had previously said, we will wait until you merge the other bits to optimize. I agree that there is no harm in the change. Later, Juan.