On Tue, 2017-08-01 at 11:21 -0500, Chien Tin Tung wrote: > On Tue, Aug 01, 2017 at 03:20:11PM +0000, Bart Van Assche wrote: > > What Leon wrote, namely that calls that send netlink data from kernel to user > > space should be non-blocking makes sense to me. > > That's been addressed in so many emails I won't rehash. If _no_one_ should > block (it is actually a one shot retry with a timeout) sending Netlink message > from kernel to user, why don't Leon or you patch that code out? All of > it, not just ibnl_unicast(). > > > So please be more constructive than replying with "NAK". > > I've sent so many emails (some you were CC'd), so I'm not sure how > much more constructive I can be. BTW, did you see the one with my > attempt at world peace (https://www.spinics.net/lists/linux-rdma/msg50591.html)? > > Here are the relevant threads for people that are interested in participating > in this discussion. > > https://patchwork.kernel.org/patch/9814367/ > > https://patchwork.kernel.org/patch/9752855/ Hello Chien, Yes, I had read these e-mails but I do not agree with all of what was written in these e-mails. I'm not sure whether you are aware of the original design goal of the netlink mechanism? It was designed on purpose to be unreliable such that sending information from the kernel to user space would never block. If sending data over a netlink socket can block then analyzing whether or not any deadlocks can occur is only possible by examining kernel and user space together. If sending data over a netlink socket never blocks then a only the kernel has to be considered when analyzing whether or not a deadlock can occur. We do not want that userspace can cause the kernel to lock up. So it's very strange to me to see that today the kernel has facilities for both blocking and non-blocking sends over netlink sockets. My opinion here is that any client that needs reliable communication from kernel to user space should use another mechanism than netlink sockets. Additionally, since the kernel reports netlink socket buffer overflows to user space through the ENOBUFS error code, what's so hard about detecting that error code in user space and resynchronizing state if recv() fails with ENOBUFS? Is your concern perhaps about the code in iwpmd/iwarp_pm_server.c? From what I see in process_iwpm_netlink_msg() it seems like all receive errors are treated equal and no state resynchronization occurs if the kernel reports ENOBUFS? Bart.-- To unsubscribe from this list: send the line "unsubscribe linux-rdma" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html