On Wed, Jan 22, 2025 at 10:02:49AM -0800, Nicolin Chen wrote: > On Wed, Jan 22, 2025 at 12:05:27AM -0800, Nicolin Chen wrote: > > On Tue, Jan 21, 2025 at 01:40:13PM -0800, Nicolin Chen wrote: > > > On Tue, Jan 21, 2025 at 05:14:04PM -0400, Jason Gunthorpe wrote: > > > > Since we don't hold the spinlock the whole time there is a race where > > > > we could pull the overflow off and then another entry could be pushed > > > > while we do the copy_to_user. > > > > > > I see. I'll be careful around that. I can imagine that one tricky > > > thing can be to restore the overflow node back to the list when a > > > copy_to_user fails.. > > > > Hmm, it gets trickier because the overflow node is a preallocated > > single node per vEVENTQ. We must not only check list_empty for its > > restore, but also protect the overflow->header.sequence from races > > between atomic_inc and copy_to_user. However, we can't use a mutex > > because copy_to_user might DOS... > > > > A simple solution could be to just duplicate overflow nodes, each > > of which contains a different sequence, like a regular vEVENT node. > > This certainly changes the uAPI for read(). Though the duplication > > of overflow nodes doesn't sound optimal, it's acceptable since the > > duplicated nodes would have been regular vEVENT nodes if there was > > no overflow (i.e. no extra overhead)? > > Ah, didn't think clearly last night.. We can't simply add overflow > nodes either for rate/memory limit reason that you concerned about > in the other email. On the other hand, though certainly not being > ideal, indulging the race at the sequence index might not be that > harmful, compared to the extreme of the other case.. > > I'll give another thought today if there's some other way out. I made a change to duplicate the overflow node in the fetch() that is protected by the spinlock, which is used for copy_to_user. Then the other routine for vEVENT reporting, protected by the spinlock, can race-freely update the preallocated overflow node. Nicolin