On Wed, Apr 17, 2024 at 01:05:47AM -0500, Elizabeth Figura wrote: > Here's a (slightly ad-hoc) simplification of the patch into text form inlined > into this message; hopefully it's readable enough. Thanks! Still needed: s/\`\`/"/g s/\.\.\ //g But then it's readable > > =================================== > NT synchronization primitive driver > =================================== > > This page documents the user-space API for the ntsync driver. > > ntsync is a support driver for emulation of NT synchronization > primitives by user-space NT emulators. It exists because implementation > in user-space, using existing tools, cannot match Windows performance > while offering accurate semantics. It is implemented entirely in > software, and does not drive any hardware device. > > This interface is meant as a compatibility tool only, and should not > be used for general synchronization. Instead use generic, versatile > interfaces such as futex(2) and poll(2). > > Synchronization primitives > ========================== > > The ntsync driver exposes three types of synchronization primitives: > semaphores, mutexes, and events. > > A semaphore holds a single volatile 32-bit counter, and a static 32-bit > integer denoting the maximum value. It is considered signaled when the > counter is nonzero. The counter is decremented by one when a wait is > satisfied. Both the initial and maximum count are established when the > semaphore is created. > > A mutex holds a volatile 32-bit recursion count, and a volatile 32-bit > identifier denoting its owner. A mutex is considered signaled when its > owner is zero (indicating that it is not owned). The recursion count is > incremented when a wait is satisfied, and ownership is set to the given > identifier. 'signaled' is used twice now but not defined. For both Semaphore and Mutex this seems to indicate uncontended? Edit: seems to be needs-wakeup more than uncontended. > A mutex also holds an internal flag denoting whether its previous owner > has died; such a mutex is said to be abandoned. Owner death is not > tracked automatically based on thread death, but rather must be > communicated using NTSYNC_IOC_MUTEX_KILL. An abandoned mutex is > inherently considered unowned. > > Except for the "unowned" semantics of zero, the actual value of the > owner identifier is not interpreted by the ntsync driver at all. The > intended use is to store a thread identifier; however, the ntsync > driver does not actually validate that a calling thread provides > consistent or unique identifiers. Why not verify it? Seems simple enough to put in a TID check, esp. if NT mandates the same. > An event holds a volatile boolean state denoting whether it is signaled > or not. There are two types of events, auto-reset and manual-reset. An > auto-reset event is designaled when a wait is satisfied; a manual-reset > event is not. The event type is specified when the event is created. But what is an event? I'm familiar with semaphores and mutexes, but less so with events. > Unless specified otherwise, all operations on an object are atomic and > totally ordered with respect to other operations on the same object. > > Objects are represented by files. When all file descriptors to an > object are closed, that object is deleted. > > Char device > =========== > > The ntsync driver creates a single char device /dev/ntsync. Each file > description opened on the device represents a unique instance intended > to back an individual NT virtual machine. Objects created by one ntsync > instance may only be used with other objects created by the same > instance. > > ioctl reference > =============== > > All operations on the device are done through ioctls. There are four > structures used in ioctl calls:: > > struct ntsync_sem_args { > __u32 sem; > __u32 count; > __u32 max; > }; > > struct ntsync_mutex_args { > __u32 mutex; > __u32 owner; > __u32 count; > }; > > struct ntsync_event_args { > __u32 event; > __u32 signaled; > __u32 manual; > }; > > struct ntsync_wait_args { > __u64 timeout; > __u64 objs; > __u32 count; > __u32 owner; > __u32 index; > __u32 alert; > __u32 flags; > __u32 pad; > }; > > Depending on the ioctl, members of the structure may be used as input, > output, or not at all. All ioctls return 0 on success. > > The ioctls on the device file are as follows: > > NTSYNC_IOC_CREATE_SEM > > Create a semaphore object. Takes a pointer to struct ntsync_sem_args, > which is used as follows: > > * sem: On output, contains a file descriptor to the created semaphore. > * count: Initial count of the semaphore. > * max: Maximum count of the semaphore. > > Fails with EINVAL if `count` is greater than `max`. So the implication is that @count and @max are input argument and as such should be set before calling the ioctl()? It would not have been weird to have the ioctl() return the fd on success I suppose, instead of mixing input and output arguments like this, but whatever, this works. > NTSYNC_IOC_CREATE_MUTEX > > Create a mutex object. Takes a pointer to struct ntsync_mutex_args, > which is used as follows: > > * mutex: On output, contains a file descriptor to the created mutex. > * count: Initial recursion count of the mutex. > * owner: Initial owner of the mutex. > > If "owner" is nonzero and "count" is zero, or if "owner" is zero > and "count" is nonzero, the function fails with EINVAL. > > NTSYNC_IOC_CREATE_EVENT > > Create an event object. Takes a pointer to struct ntsync_event_args, > which is used as follows: > > * event: On output, contains a file descriptor to the created event. > * signaled: If nonzero, the event is initially signaled, otherwise > nonsignaled. > * manual: If nonzero, the event is a manual-reset event, otherwise > auto-reset. > Still mystified as to what event actually is, perhaps more clues below... > The ioctls on the individual objects are as follows: > > NTSYNC_IOC_SEM_POST > > Post to a semaphore object. Takes a pointer to a 32-bit integer, > which on input holds the count to be added to the semaphore, and on > output contains its previous count. > > If adding to the semaphore's current count would raise the latter > past the semaphore's maximum count, the ioctl fails with > EOVERFLOW and the semaphore is not affected. If raising the > semaphore's count causes it to become signaled, eligible threads > waiting on this semaphore will be woken and the semaphore's count > decremented appropriately. Urg, so this is the traditional V (vrijgeven per Dijkstra, release in English), but now 'conveniently' called POST, such that it can be readily confused with the P operation (passering, or passing) which it is not. Glorious :-/ You're of course going to tell me NT did this and you can't help this naming foible. > NTSYNC_IOC_MUTEX_UNLOCK > > Release a mutex object. Takes a pointer to struct ntsync_mutex_args, > which is used as follows: > > * mutex: Ignored. > * owner: Specifies the owner trying to release this mutex. > * count: On output, contains the previous recursion count. > > If "owner" is zero, the ioctl fails with EINVAL. If "owner" > is not the current owner of the mutex, the ioctl fails with > EPERM. ISTR you having written elsewhere that NT actually demands mutexes to be strictly per thread, which for the above would mandate @owner to be current, no? > The mutex's count will be decremented by one. If decrementing the > mutex's count causes it to become zero, the mutex is marked as > unowned and signaled, and eligible threads waiting on it will be > woken as appropriate. > > NTSYNC_IOC_SET_EVENT > > Signal an event object. Takes a pointer to a 32-bit integer, which on > output contains the previous state of the event. > > Eligible threads will be woken, and auto-reset events will be > designaled appropriately. Hmm, so the event thing is like a simple wait-wake scheme? Where the 'signaled' bit is used as the wakeup state? > NTSYNC_IOC_RESET_EVENT > > Designal an event object. Takes a pointer to a 32-bit integer, which > on output contains the previous state of the event. > > NTSYNC_IOC_PULSE_EVENT > > Wake threads waiting on an event object while leaving it in an > unsignaled state. Takes a pointer to a 32-bit integer, which on > output contains the previous state of the event. > > A pulse operation can be thought of as a set followed by a reset, > performed as a single atomic operation. If two threads are waiting on > an auto-reset event which is pulsed, only one will be woken. If two > threads are waiting a manual-reset event which is pulsed, both will > be woken. However, in both cases, the event will be unsignaled > afterwards, and a simultaneous read operation will always report the > event as unsignaled. *groan* > NTSYNC_IOC_READ_SEM > > Read the current state of a semaphore object. Takes a pointer to > struct ntsync_sem_args, which is used as follows: > > * sem: Ignored. > * count: On output, contains the current count of the semaphore. > * max: On output, contains the maximum count of the semaphore. This seems inherently racy -- what is the intended purpose of this interface? Specifically the moment a value is returned, either P or V operations can change it, rendering the (as yet unused) return value incorrect. > NTSYNC_IOC_READ_MUTEX > > Read the current state of a mutex object. Takes a pointer to struct > ntsync_mutex_args, which is used as follows: > > * mutex: Ignored. > * owner: On output, contains the current owner of the mutex, or zero > if the mutex is not currently owned. > * count: On output, contains the current recursion count of the mutex. > > If the mutex is marked as abandoned, the function fails with > EOWNERDEAD. In this case, "count" and "owner" are set to zero. Another questionable interface. I suspect you're going to be telling me NT has them so you have to have them, but urgh. > NTSYNC_IOC_READ_EVENT > > Read the current state of an event object. Takes a pointer to struct > ntsync_event_args, which is used as follows: > > * event: Ignored. > * signaled: On output, contains the current state of the event. > * manual: On output, contains 1 if the event is a manual-reset event, > and 0 otherwise. I can't help but notice all those @sem, @mutex, @event 'output' members being unused except for create. Seems like a waste to have them. > NTSYNC_IOC_KILL_OWNER > > Mark a mutex as unowned and abandoned if it is owned by the given > owner. Takes an input-only pointer to a 32-bit integer denoting the > owner. If the owner is zero, the ioctl fails with EINVAL. If the > owner does not own the mutex, the function fails with EPERM. > > Eligible threads waiting on the mutex will be woken as appropriate > (and such waits will fail with EOWNERDEAD, as described below). Wine will use this when it detects a thread exit I suppose. > NTSYNC_IOC_WAIT_ANY > > Poll on any of a list of objects, atomically acquiring at most one. > Takes a pointer to struct ntsync_wait_args, which is used as follows: > > * timeout: Absolute timeout in nanoseconds. If NTSYNC_WAIT_REALTIME > is set, the timeout is measured against the REALTIME > clock; otherwise it is measured against the MONOTONIC > clock. If the timeout is equal to or earlier than the > current time, the function returns immediately without > sleeping. If "timeout" is U64_MAX, the function will > sleep until an object is signaled, and will not fail > with ETIMEDOUT. > > * objs: Pointer to an array of "count" file descriptors > (specified as an integer so that the structure has the > same size regardless of architecture). If any object is > invalid, the function fails with EINVAL. > > * count: Number of objects specified in the "objs" array. If > greater than NTSYNC_MAX_WAIT_COUNT, the function fails > with EINVAL. > > * owner: Mutex owner identifier. If any object in "objs" is a > mutex, the ioctl will attempt to acquire that mutex on > behalf of "owner". If "owner" is zero, the ioctl > fails with EINVAL. Again, should that not be current? That is, why not maintain the NT invariant and mandates TIDs and avoid the arguments in both cases? > * index: On success, contains the index (into "objs") of the > object which was signaled. If "alert" was signaled > instead, this contains "count". Could be the actual return value, no? Edit: no it cannot be because -EOWNERDEAD case below. > > * alert: Optional event object file descriptor. If nonzero, this > specifies an "alert" event object which, if signaled, > will terminate the wait. If nonzero, the identifier must > point to a valid event. > > * flags: Zero or more flags. Currently the only flag is > NTSYNC_WAIT_REALTIME, which causes the timeout to be > measured against the REALTIME clock instead of > MONOTONIC. > > * pad: Unused, must be set to zero. > > This function attempts to acquire one of the given objects. If unable > to do so, it sleeps until an object becomes signaled, subsequently > acquiring it, or the timeout expires. In the latter case the ioctl > fails with ETIMEDOUT. The function only acquires one object, even if > multiple objects are signaled. Any guarantee as to which will be acquired in case multiple are available? [A] > A semaphore is considered to be signaled if its count is nonzero, and > is acquired by decrementing its count by one. A mutex is considered > to be signaled if it is unowned or if its owner matches the "owner" > argument, and is acquired by incrementing its recursion count by one > and setting its owner to the "owner" argument. An auto-reset event > is acquired by designaling it; a manual-reset event is not affected > by acquisition. > > Acquisition is atomic and totally ordered with respect to other > operations on the same object. If two wait operations (with different > "owner" identifiers) are queued on the same mutex, only one is > signaled. If two wait operations are queued on the same semaphore, > and a value of one is posted to it, only one is signaled. The order > in which threads are signaled is not specified. Note that you do list the lack of guarantee here, but not above. I suspect both cases are similar and guarantee nothing. > If an abandoned mutex is acquired, the ioctl fails with > EOWNERDEAD. Although this is a failure return, the function may > otherwise be considered successful. The mutex is marked as owned by > the given owner (with a recursion count of 1) and as no longer > abandoned, and "index" is still set to the index of the mutex. Aaah, I see, this does indeed preclude @index from being the return value. > The "alert" argument is an "extra" event which can terminate the > wait, independently of all other objects. If members of "objs" and > "alert" are both simultaneously signaled, a member of "objs" will > always be given priority and acquired first. > > It is valid to pass the same object more than once, including by > passing the same event in the "objs" array and in "alert". If a > wakeup occurs due to that object being signaled, "index" is set to > the lowest index corresponding to that object. Urgh, is this an actual guarantee? This almost seems to imply that at [A] above we can indeed guarantee the lowest indexed object is acquired first. > The function may fail with EINTR if a signal is received. In which case @index must be disregarded since nothing will be acquired, right? So far nothing really weird, and I'm thinking futexes should be able to do all this, no? > NTSYNC_IOC_WAIT_ALL > > Poll on a list of objects, atomically acquiring all of them. Takes a > pointer to struct ntsync_wait_args, which is used identically to > NTSYNC_IOC_WAIT_ANY, except that "index" is always filled with zero > on success if not woken via alert. Whee, and this is the one weird operation that you're all struggling to emulate, right? The atomic multi-acquire is 'hard' to do with futexes. > This function attempts to simultaneously acquire all of the given > objects. If unable to do so, it sleeps until all objects become > simultaneously signaled, subsequently acquiring them, or the timeout > expires. In the latter case the ioctl fails with ETIMEDOUT and no > objects are modified. > > Objects may become signaled and subsequently designaled (through > acquisition by other threads) while this thread is sleeping. Only > once all objects are simultaneously signaled does the ioctl acquire > them and return. The entire acquisition is atomic and totally ordered > with respect to other operations on any of the given objects. > > If an abandoned mutex is acquired, the ioctl fails with > EOWNERDEAD. Similarly to NTSYNC_IOC_WAIT_ANY, all objects are > nevertheless marked as acquired. Note that if multiple mutex objects > are specified, there is no way to know which were marked as > abandoned. > > As with "any" waits, the "alert" argument is an "extra" event which > can terminate the wait. Critically, however, an "all" wait will > succeed if all members in "objs" are signaled, *or* if "alert" is > signaled. In the latter case "index" will be set to "count". As > with "any" waits, if both conditions are filled, the former takes > priority, and objects in "objs" will be acquired. > > Unlike NTSYNC_IOC_WAIT_ANY, it is not valid to pass the same > object more than once, nor is it valid to pass the same object in > "objs" and in "alert". If this is attempted, the function fails > with EINVAL. OK, this all was helpful, I'll go stare at the code again. Thanks!