On Mon, Mar 30, 2020 at 02:24:16PM +0200, Vitaly Kuznetsov wrote: > Michael Kelley <mikelley@xxxxxxxxxxxxx> writes: > > > From: Andrea Parri <parri.andrea@xxxxxxxxx> Sent: Saturday, March 28, 2020 10:09 AM > >> > >> > In case we believe that OFFER -> RESCINF sequence is always ordered > >> > by the host AND we don't care about other offers in the queue the > >> > suggested locking is OK: we're guaranteed to process RESCIND after we > >> > finished processing OFFER for the same channel. However, waiting for > >> > 'offer_in_progress == 0' looks fishy so I'd suggest we at least add a > >> > comment explaining that the wait is only needed to serialize us with > >> > possible OFFER for the same channel - and nothing else. I'd personally > >> > still slightly prefer the algorythm I suggested as it guarantees we take > >> > channel_mutex with offer_in_progress == 0 -- even if there are no issues > >> > we can think of today (not strongly though). > >> > >> Does it? offer_in_progress is incremented without channel_mutex... > >> > > No, it does not, you're right, by itself the change is insufficient. > > >> IAC, I have no objections to apply the changes you suggested. To avoid > >> misunderstandings: vmbus_bus_suspend() presents a similar usage... Are > >> you suggesting that I apply similar changes there? > >> > >> Alternatively: FWIW, the comment in vmbus_onoffer_rescind() does refer > >> to "The offer msg and the corresponding rescind msg...". I am all ears > >> if you have any concrete suggestions to improve these comments. > >> > > > > Given that waiting for 'offer_in_progress == 0' is the current code, I think > > there's an argument to made for not changing it if the change isn't strictly > > necessary. This patch set introduces enough change that *is* necessary. :-) > > > > Sure. I was thinking a bit more about this and it seems that over years > we've made the synchronization of channels code too complex (every time > for a good reason but still). Now (before this series) we have at least: > > vmbus_connection.channel_mutex > vmbus_connection.offer_in_progress > channel.probe_done > channel.rescind > Workqueues (vmbus_connection.work_queue, > queue_work_on(vmbus_connection.connect_cpu),...) > channel.lock spinlock (the least of the problems) > > Maybe there's room for improvement? Out of top of my head I'd suggest a > state machine for each channel (e.g something like > OFFERED->OPENING->OPEN->RESCIND_REQ->RESCINDED->CLOSED) + refcounting > (subchannels, open/rescind/... requests in progress, ...) + non-blocking > request handling like "Can we handle this rescind offer now? No, > refcount is too big. OK, rescheduling the work". Maybe not the best > design ever and I'd gladly support any other which improves the > readability of the code and makes all state changes and synchronization > between them more obvious. > > Note, VMBus channel handling driven my messages (unlike events for ring > buffer) is not performance critical, we just need to ensure completeness > (all requests are handled correctly) with forward progress guarantees > (no deadlocks). > > I understand the absence of 'hot' issues in the current code is what can > make the virtue of redesign questionable and sorry for hijacking the > series which doesn't seem to make things worse :-) (Back here... Sorry for the delay.) FWIW, what you wrote above makes sense to me; and *yes*, the series in question was not intended as "let us undertake such a redesign" (quite the opposite in fact...) With regard to this specific patch, it seems to me that we've reached a certain consensus or, at least, I don't see complaints ;-). Please let me know if I misunderstood. Thanks, Andrea