On Fri, Jun 14, 2019 at 02:46:36PM +0200, Lorenzo Bianconi wrote: > > > > > > ack, right. I think patch 2/3 and 3/3 can go directly in Felix's tree > > > > > > > > > > > > + int i, data_size; > > > > > > > > > > + data_size = rounddown(SKB_WITH_OVERHEAD(q->buf_size), > > > > > + dev->usb.in_ep[MT_EP_IN_PKT_RX].max_packet); > > > > > for (i = 0; i < nsgs; i++) { > > > > > struct page *page; > > > > > void *data; > > > > > @@ -302,7 +304,7 @@ mt76u_fill_rx_sg(struct mt76_dev *dev, struct mt76_queue *q, struct urb *urb, > > > > > > > > > > page = virt_to_head_page(data); > > > > > offset = data - page_address(page); > > > > > - sg_set_page(&urb->sg[i], page, q->buf_size, offset); > > > > > + sg_set_page(&urb->sg[i], page, data_size, offset); > > > > <snip> > > > > > - q->buf_size = dev->usb.sg_en ? MT_RX_BUF_SIZE : PAGE_SIZE; > > > > > q->ndesc = MT_NUM_RX_ENTRIES; > > > > > + q->buf_size = PAGE_SIZE; > > > > > + > > > > > > > > This should be associated with decrease of MT_SG_MAX_SIZE to value that > > > > is actually needed and currently this is 2 for 4k AMSDU. > > > > > > MT_SG_MAX_SIZE is used even on tx side and I do not think we will end up with a > > > huge difference here > > > > So use different value as argument for mt76u_fill_rx_sg() in > > mt76u_rx_urb_alloc(). After changing buf_size to PAGE_SIZE we will > > allocate 8 pages per rx queue entry, but only 2 pages will be used > > (with data_size change, 1 without data_size change). Or I'm wrong? > > yes, it is right (we will use two pages with data_size change). Maybe better to > use 4 pages for each rx queue entry? (otherwise we will probably change it in > the future) We should not allocate more than is required. If support for bigger rx AMSDUs will be added and announced in vht/ht capabilities to remote stations, then increase of number of segments will be needed. > > > > However I don't think allocating 2 pages to avoid ieee80211 header and SNAP > > > > copy is worth to do. For me best approach would be allocate 1 page for > > > > 4k AMSDU, 2 for 8k and 3 for 12k (still using sg, but without data_size > > > > change to avoid 32B copying). > > > > > > From my point of view it is better to avoid copying if it is possible. Are you > > > sure there is no difference? > > > > I do not understand what you mean by difference here. > > tpt differences, not sure if there are any I would not expect any measurable difference in tpt nor in cpu usage either way. But I think, if some AMSDU subframe will be spited into two fragments, data most likely will need to be linearised/copied, at some point before passed to application, what will overcome any benefit of avoiding coping 802.11 header. Thought, I don't think this somehow will be visible in benchmarking. Stanislaw