Re: [PATCH v20 2/5] ring-buffer: Introducing ring-buffer mapping functions

Vincent Donnefort <vdonnefort@xxxxxxxxxx> · Mon, 22 Apr 2024 19:01:36 +0100

On Thu, Apr 18, 2024 at 11:43:46PM -0400, Steven Rostedt wrote:
> On Thu, 18 Apr 2024 09:55:55 +0300
> Mike Rapoport <rppt@xxxxxxxxxx> wrote:
> 
> Hi Mike,
> 
> Thanks for doing this review!
> 
> > > +/**
> > > + * struct trace_buffer_meta - Ring-buffer Meta-page description
> > > + * @meta_page_size:	Size of this meta-page.
> > > + * @meta_struct_len:	Size of this structure.
> > > + * @subbuf_size:	Size of each sub-buffer.
> > > + * @nr_subbufs:		Number of subbfs in the ring-buffer, including the reader.
> > > + * @reader.lost_events:	Number of events lost at the time of the reader swap.
> > > + * @reader.id:		subbuf ID of the current reader. ID range [0 : @nr_subbufs - 1]
> > > + * @reader.read:	Number of bytes read on the reader subbuf.
> > > + * @flags:		Placeholder for now, 0 until new features are supported.
> > > + * @entries:		Number of entries in the ring-buffer.
> > > + * @overrun:		Number of entries lost in the ring-buffer.
> > > + * @read:		Number of entries that have been read.
> > > + * @Reserved1:		Reserved for future use.
> > > + * @Reserved2:		Reserved for future use.
> > > + */
> > > +struct trace_buffer_meta {
> > > +	__u32		meta_page_size;
> > > +	__u32		meta_struct_len;
> > > +
> > > +	__u32		subbuf_size;
> > > +	__u32		nr_subbufs;
> > > +
> > > +	struct {
> > > +		__u64	lost_events;
> > > +		__u32	id;
> > > +		__u32	read;
> > > +	} reader;
> > > +
> > > +	__u64	flags;
> > > +
> > > +	__u64	entries;
> > > +	__u64	overrun;
> > > +	__u64	read;
> > > +
> > > +	__u64	Reserved1;
> > > +	__u64	Reserved2;  
> > 
> > Why do you need reserved fields? This structure always resides in the
> > beginning of a page and the rest of the page is essentially "reserved".
> 
> So this code is also going to be used in arm's pkvm hypervisor code,
> where it will be using these fields, but since we are looking at
> keeping the same interface between the two, we don't want these used by
> this interface.
> 
> We probably should add a comment about that.
> 
> > 
> > > +};
> > > +
> > > +#endif /* _TRACE_MMAP_H_ */
> > > diff --git a/kernel/trace/ring_buffer.c b/kernel/trace/ring_buffer.c
> > > index cc9ebe593571..793ecc454039 100644
> > > --- a/kernel/trace/ring_buffer.c
> > > +++ b/kernel/trace/ring_buffer.c  
> > 
> > ... 
> > 
> > > +static void rb_setup_ids_meta_page(struct ring_buffer_per_cpu *cpu_buffer,
> > > +				   unsigned long *subbuf_ids)
> > > +{
> > > +	struct trace_buffer_meta *meta = cpu_buffer->meta_page;
> > > +	unsigned int nr_subbufs = cpu_buffer->nr_pages + 1;
> > > +	struct buffer_page *first_subbuf, *subbuf;
> > > +	int id = 0;
> > > +
> > > +	subbuf_ids[id] = (unsigned long)cpu_buffer->reader_page->page;
> > > +	cpu_buffer->reader_page->id = id++;
> > > +
> > > +	first_subbuf = subbuf = rb_set_head_page(cpu_buffer);
> > > +	do {
> > > +		if (WARN_ON(id >= nr_subbufs))
> > > +			break;
> > > +
> > > +		subbuf_ids[id] = (unsigned long)subbuf->page;
> > > +		subbuf->id = id;
> > > +
> > > +		rb_inc_page(&subbuf);
> > > +		id++;
> > > +	} while (subbuf != first_subbuf);
> > > +
> > > +	/* install subbuf ID to kern VA translation */
> > > +	cpu_buffer->subbuf_ids = subbuf_ids;
> > > +
> > > +	/* __rb_map_vma() pads the meta-page to align it with the sub-buffers */
> > > +	meta->meta_page_size = PAGE_SIZE << cpu_buffer->buffer->subbuf_order;  
> > 
> > Isn't this a single page?
> 
> One thing we are doing is to make sure that the subbuffers are aligned
> by their size. If a subbuffer is 3 pages, it should be aligned on 3
> page boundaries. This was something that Linus suggested.
> 
> > 
> > > +	meta->meta_struct_len = sizeof(*meta);
> > > +	meta->nr_subbufs = nr_subbufs;
> > > +	meta->subbuf_size = cpu_buffer->buffer->subbuf_size + BUF_PAGE_HDR_SIZE;
> > > +
> > > +	rb_update_meta_page(cpu_buffer);
> > > +}  
> > 
> > ...
> > 
> > > +#define subbuf_page(off, start) \
> > > +	virt_to_page((void *)((start) + ((off) << PAGE_SHIFT)))
> > > +
> > > +#define foreach_subbuf_page(sub_order, start, page)		\  
> > 
> > Nit: usually iterators in kernel use for_each
> 
> Ah, good catch. Yeah, that should be changed. But then ...
> 
> > 
> > > +	page = subbuf_page(0, (start));				\
> > > +	for (int __off = 0; __off < (1 << (sub_order));		\
> > > +	     __off++, page = subbuf_page(__off, (start)))  
> > 
> > The pages are allocated with alloc_pages_node(.. subbuf_order) are
> > physically contiguous and struct pages for them are also contiguous, so
> > inside a subbuf_order allocation you can just do page++.
> > 
> 
> I'm wondering if we should just nuke the macro. It was there because of
> the previous implementation did things twice. But now it's just done
> once here:
> 
> +	while (s < nr_subbufs && p < nr_pages) {
> +		struct page *page;
> +
> +		foreach_subbuf_page(subbuf_order, cpu_buffer->subbuf_ids[s], page) {
> +			if (p >= nr_pages)
> +				break;
> +
> +			pages[p++] = page;
> +		}
> +		s++;
> +	}
> 
> Perhaps that should just be:
> 
> 	while (s < nr_subbufs && p < nr_pages) {
> 		struct page *page;
> 		int off;
> 
> 		page = subbuf_page(0, cpu_buffer->subbuf_ids[s]);
> 		for (off = 0; off < (1 << subbuf_order); off++, page++, p++) {
> 			if (p >= nr_pages)
> 				break;
> 
> 			pages[p] = page;
> 		}
> 		s++;
> 	}
> 
>  ?

Yeah, was hesitating to kill it with the last version. Happy to do it for the
next one.

> 
> -- Steve