On Tue 27-06-23 15:28:29, David Hildenbrand wrote: > On 27.06.23 14:34, Michal Hocko wrote: > > On Tue 27-06-23 13:22:16, David Hildenbrand wrote: > > > Let's check for fatal signals only. That looks cleaner and still keeps > > > the documented use case for manual user-space triggered memory offlining > > > working. From Documentation/admin-guide/mm/memory-hotplug.rst: > > > > > > % timeout $TIMEOUT offline_block | failure_handling > > > > > > In fact, we even document there: "the offlining context can be terminated > > > by sending a fatal signal". > > > > We should be fixing documentation instead. This could break users who do > > have a SIGALRM signal hander installed. > > You mean because timeout will send a SIGALRM, which is not considered fatal > in case a signal handler is installed? Correct. > At least the "traditional" tools I am aware of don't set a timeout at all > (crossing fingers that they never end up stuck): > * chmem > * QEMU guest agent > * powerpc-utils > > libdaxctl also doesn't seem to implement an easy-to-spot timeout for memory > offlining, but it also doesn't configure SIGALRM. > > > Of course, that doesn't mean that there isn't somewhere a program that does > that; I merely assume that it would be pretty unlikely to find such a > program. > > But no strong opinion: we can also keep it like that, update the doc and add > a comment why this one here is different than most other signal backoff > checks. Well, the existing signal handling approach is there for way too long to be sure. I personally would prefer fatal_signal_pending as that reflects more what we do elsewhere but here we are. Historical baggage... -- Michal Hocko SUSE Labs