On Thu, Oct 22, 2015 at 12:37:44AM +0800, Lan Tianyu wrote: > Migration relies on tracking dirty page to migrate memory. > Hardware can't automatically mark a page as dirty after DMA > memory access. VF descriptor rings and data buffers are modified > by hardware when receive and transmit data. To track such dirty memory > manually, do dummy writes(read a byte and write it back) during receive > and transmit data. > > Signed-off-by: Lan Tianyu <tianyu.lan@xxxxxxxxx> > --- > drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > index d22160f..ce7bd7a 100644 > --- a/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > +++ b/drivers/net/ethernet/intel/ixgbevf/ixgbevf_main.c > @@ -414,6 +414,9 @@ static bool ixgbevf_clean_tx_irq(struct ixgbevf_q_vector *q_vector, > if (!(eop_desc->wb.status & cpu_to_le32(IXGBE_TXD_STAT_DD))) > break; > > + /* write back status to mark page dirty */ Which page? the descriptor ring? What does marking it dirty accomplish though, given that we might migrate right before this happens? It might be a good idea to just specify addresses of rings to hypervisor, and have it send the ring pages after VM and the VF are stopped. > + eop_desc->wb.status = eop_desc->wb.status; > + Compiler is likely to optimize this out. You also probably need a wmb here ... > /* clear next_to_watch to prevent false hangs */ > tx_buffer->next_to_watch = NULL; > tx_buffer->desc_num = 0; > @@ -946,15 +949,17 @@ static struct sk_buff *ixgbevf_fetch_rx_buffer(struct ixgbevf_ring *rx_ring, > { > struct ixgbevf_rx_buffer *rx_buffer; > struct page *page; > + u8 *page_addr; > > rx_buffer = &rx_ring->rx_buffer_info[rx_ring->next_to_clean]; > page = rx_buffer->page; > prefetchw(page); > > - if (likely(!skb)) { > - void *page_addr = page_address(page) + > - rx_buffer->page_offset; > + /* Mark page dirty */ Looks like there's a race condition here: VM could migrate at this point. RX ring will indicate packet has been received, but page data would be stale. One solution I see is explicitly testing for this condition and discarding the packet. For example, hypervisor could increment some counter in RAM during migration. Then: x = read counter get packet from rx ring mark page dirty y = read counter if (x != y) discard packet > + page_addr = page_address(page) + rx_buffer->page_offset; > + *page_addr = *page_addr; Compiler is likely to optimize this out. You also probably need a wmb here ... > > + if (likely(!skb)) { > /* prefetch first cache line of first page */ > prefetch(page_addr); prefetch makes no sense if you read it right here. > #if L1_CACHE_BYTES < 128 > @@ -1032,6 +1037,9 @@ static int ixgbevf_clean_rx_irq(struct ixgbevf_q_vector *q_vector, > if (!ixgbevf_test_staterr(rx_desc, IXGBE_RXD_STAT_DD)) > break; > > + /* Write back status to mark page dirty */ > + rx_desc->wb.upper.status_error = rx_desc->wb.upper.status_error; > + same question as for tx. > /* This memory barrier is needed to keep us from reading > * any other fields out of the rx_desc until we know the > * RXD_STAT_DD bit is set > -- > 1.8.4.rc0.1.g8f6a3e5.dirty > > -- > To unsubscribe from this list: send the line "unsubscribe kvm" in > the body of a message to majordomo@xxxxxxxxxxxxxxx > More majordomo info at http://vger.kernel.org/majordomo-info.html -- To unsubscribe from this list: send the line "unsubscribe linux-pci" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html