perf_event_open.2: mmap ring buffer requirement for receiving overflow notifications

[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

 



I have found that when monitoring a file descriptor returned by
perf_event_open() with poll(), it is required to allocate an mmap ring
buffer to properly receive overflow notifications. If this is not
done, poll() keeps continuously returning POLLHUP, even when an
overflow notification should not be raised. Notably, this behavior is
different from listening for overflow notifications by setting the
O_ASYNC flag on the file descriptor - in that case, creating the mmap
ring buffer is not required to receive the SIGIO signal delivered
after the file descriptor becomes available for reading. I attach code
showcasing this behavior (the functionality is explained in the
comments).

This behavior by itself is not a problem, however, in the current
state of the perf_event_open man page, it's not documented, and in
fact, there are confusing statements that seem to contradict my
findings. In the MMAP layout section of the page, you can find this
sentence:
Before Linux 2.6.39, there is a bug that means you must allocate
an mmap ring buffer when sampling even if you do not plan to
access it.
Unless I'm somehow misunderstanding it, this statement does not seem
to be well worded, or alternatively this bug does not seem to be
fixed. I would not call simply using poll() on the file descriptor
intent to access the ring buffer (unless it's meant to be understood
that way, in which case, in my opinion, it's quite confusing).
Additionally, I cannot find any change in Linux 2.6.39 that would fit
this description (however, that is likely just due to my lack of
experience searching through the kernel changelogs and commits).

I would like to receive clarification on whether this current behavior
of perf_event_open is intentional and desired (that is why I cc'd
linux-perf-users). If it is, I could also create a patch to the man
page that lays out the requirements more clearly. In that case, it
would also be helpful to further clarify the wording of the sentence
mentioning the Linux 2.6.39 change, however I don't know if I'm
qualified to do that, because as I have previously stated, I am unable
to find what changes that sentence actually refers to.
#include <linux/perf_event.h>
#include <sys/syscall.h>
#include <sys/mman.h>
#include <iostream>
#include <unistd.h>
#include <signal.h>
#include <fcntl.h>
#include <cstdint>
#include <poll.h>

// Modify the value of this constant to change the variant of the program
// that is run. The possible values are:
// 1: SIGIO without mmap, 2: SIGIO with mmap, 
// 3: poll without mmap, 4: poll with mmap
// As stated in the email, varaints 1, 2 and 4 properly trigger overflow
// notifications approximately after each 1000000000 hardware instructions,
// however when the program is run with variant = 3, poll will just 
// continuously return POLLHUP, without waiting for the overflow
// 
// Also, before running any variant, make sure to set the 
// kernel.perf_event_paranoid sysctl to -1 
// (for example by running sudo sysctl kernel.perf_event_paranoid=-1)
const int variant = 1;

static long perf_event_open(struct perf_event_attr *hw_event, pid_t
	pid, int cpu, int group_fd, unsigned long flags) {
    return syscall(SYS_perf_event_open, hw_event, pid, cpu, group_fd, flags);
}

volatile sig_atomic_t sigioOccurred = 0;
void sigioHandler(int signum) {
    sigioOccurred = 1;
}

uint64_t get_instructions_used(int perf_fd) {
    uint64_t result;
    ssize_t size = read(perf_fd, &result, sizeof(uint64_t));

    if (size != sizeof(result)) {
        std::cout << "read failed";
        exit(0);
    }
    if (result < 0) {
        std::cout << "read negative instructions count";
        exit(0);
    }

    return result;
}

int main() {
    struct sigaction sa;
    sa.sa_handler = sigioHandler; sa.sa_flags = 0; sigemptyset(&sa.sa_mask);
    sigaction(SIGIO, &sa, 0);

    int child = fork(), num = 2;
    if(child == 0) {
        while(true) {
            num *= 2;
        }
    }

    struct perf_event_attr attrs {}; attrs.config = PERF_COUNT_HW_INSTRUCTIONS; 
    attrs.type = PERF_TYPE_HARDWARE; attrs.sample_period = 1000000000; 
	attrs.wakeup_events = 1;
    int perf_fd = perf_event_open(&attrs, child, -1, -1, 0);

    if(variant == 2 or variant == 4) {
        void *base = mmap(NULL, getpagesize() * (8192 + 1), PROT_READ
			| PROT_WRITE, MAP_SHARED, perf_fd, 0);
		
        if (base == MAP_FAILED) {
            std::cout << "mmap err " << errno << "\n";
            return -1;
        }
    }

    if(variant == 1 or variant == 2) {
        fcntl(perf_fd, F_SETOWN, getpid());
        fcntl(perf_fd, F_SETFL, (fcntl(perf_fd, F_GETFL, 0) | O_ASYNC));
    }

    while(true) {
        if(variant == 1 or variant == 2) {
            if(sigioOccurred) {
                std::cout << "SIGIO delivered, instructions used: " <<
					get_instructions_used(perf_fd) << "\n";
				
                sigioOccurred = 0;
            }
        }

        if(variant == 3 or variant == 4) {
            struct pollfd pfd = { .fd = perf_fd, .events = POLLIN };
            int res = poll(&pfd, 1, 1000000);

            std::cout << "Poll returned ";
            if(pfd.revents == POLLHUP)
                std::cout << "POLLHUP, instructions used: " << 
					get_instructions_used(perf_fd) << "\n";
            else if(pfd.revents == POLLIN)
                std::cout << "POLLIN, instructions used: " <<
					get_instructions_used(perf_fd) << "\n";
            else
                std::cout << pfd.revents << "\n";
        }
    }

    return 0;
}

[Index of Archives]     [Kernel Documentation]     [Netdev]     [Linux Ethernet Bridging]     [Linux Wireless]     [Kernel Newbies]     [Security]     [Linux for Hams]     [Netfilter]     [Bugtraq]     [Yosemite News]     [MIPS Linux]     [ARM Linux]     [Linux RAID]     [Linux Admin]     [Samba]

  Powered by Linux