I have found that when monitoring a file descriptor returned by perf_event_open() with poll(), it is required to allocate an mmap ring buffer to properly receive overflow notifications. If this is not done, poll() keeps continuously returning POLLHUP, even when an overflow notification should not be raised. Notably, this behavior is different from listening for overflow notifications by setting the O_ASYNC flag on the file descriptor - in that case, creating the mmap ring buffer is not required to receive the SIGIO signal delivered after the file descriptor becomes available for reading. I attach code showcasing this behavior (the functionality is explained in the comments). This behavior by itself is not a problem, however, in the current state of the perf_event_open man page, it's not documented, and in fact, there are confusing statements that seem to contradict my findings. In the MMAP layout section of the page, you can find this sentence: Before Linux 2.6.39, there is a bug that means you must allocate an mmap ring buffer when sampling even if you do not plan to access it. Unless I'm somehow misunderstanding it, this statement does not seem to be well worded, or alternatively this bug does not seem to be fixed. I would not call simply using poll() on the file descriptor intent to access the ring buffer (unless it's meant to be understood that way, in which case, in my opinion, it's quite confusing). Additionally, I cannot find any change in Linux 2.6.39 that would fit this description (however, that is likely just due to my lack of experience searching through the kernel changelogs and commits). I would like to receive clarification on whether this current behavior of perf_event_open is intentional and desired (that is why I cc'd linux-perf-users). If it is, I could also create a patch to the man page that lays out the requirements more clearly. In that case, it would also be helpful to further clarify the wording of the sentence mentioning the Linux 2.6.39 change, however I don't know if I'm qualified to do that, because as I have previously stated, I am unable to find what changes that sentence actually refers to.
#include <linux/perf_event.h> #include <sys/syscall.h> #include <sys/mman.h> #include <iostream> #include <unistd.h> #include <signal.h> #include <fcntl.h> #include <cstdint> #include <poll.h> // Modify the value of this constant to change the variant of the program // that is run. The possible values are: // 1: SIGIO without mmap, 2: SIGIO with mmap, // 3: poll without mmap, 4: poll with mmap // As stated in the email, varaints 1, 2 and 4 properly trigger overflow // notifications approximately after each 1000000000 hardware instructions, // however when the program is run with variant = 3, poll will just // continuously return POLLHUP, without waiting for the overflow // // Also, before running any variant, make sure to set the // kernel.perf_event_paranoid sysctl to -1 // (for example by running sudo sysctl kernel.perf_event_paranoid=-1) const int variant = 1; static long perf_event_open(struct perf_event_attr *hw_event, pid_t pid, int cpu, int group_fd, unsigned long flags) { return syscall(SYS_perf_event_open, hw_event, pid, cpu, group_fd, flags); } volatile sig_atomic_t sigioOccurred = 0; void sigioHandler(int signum) { sigioOccurred = 1; } uint64_t get_instructions_used(int perf_fd) { uint64_t result; ssize_t size = read(perf_fd, &result, sizeof(uint64_t)); if (size != sizeof(result)) { std::cout << "read failed"; exit(0); } if (result < 0) { std::cout << "read negative instructions count"; exit(0); } return result; } int main() { struct sigaction sa; sa.sa_handler = sigioHandler; sa.sa_flags = 0; sigemptyset(&sa.sa_mask); sigaction(SIGIO, &sa, 0); int child = fork(), num = 2; if(child == 0) { while(true) { num *= 2; } } struct perf_event_attr attrs {}; attrs.config = PERF_COUNT_HW_INSTRUCTIONS; attrs.type = PERF_TYPE_HARDWARE; attrs.sample_period = 1000000000; attrs.wakeup_events = 1; int perf_fd = perf_event_open(&attrs, child, -1, -1, 0); if(variant == 2 or variant == 4) { void *base = mmap(NULL, getpagesize() * (8192 + 1), PROT_READ | PROT_WRITE, MAP_SHARED, perf_fd, 0); if (base == MAP_FAILED) { std::cout << "mmap err " << errno << "\n"; return -1; } } if(variant == 1 or variant == 2) { fcntl(perf_fd, F_SETOWN, getpid()); fcntl(perf_fd, F_SETFL, (fcntl(perf_fd, F_GETFL, 0) | O_ASYNC)); } while(true) { if(variant == 1 or variant == 2) { if(sigioOccurred) { std::cout << "SIGIO delivered, instructions used: " << get_instructions_used(perf_fd) << "\n"; sigioOccurred = 0; } } if(variant == 3 or variant == 4) { struct pollfd pfd = { .fd = perf_fd, .events = POLLIN }; int res = poll(&pfd, 1, 1000000); std::cout << "Poll returned "; if(pfd.revents == POLLHUP) std::cout << "POLLHUP, instructions used: " << get_instructions_used(perf_fd) << "\n"; else if(pfd.revents == POLLIN) std::cout << "POLLIN, instructions used: " << get_instructions_used(perf_fd) << "\n"; else std::cout << pfd.revents << "\n"; } } return 0; }