On 29/11/10 13:39, Jozsef Kadlecsik wrote: > On Mon, 29 Nov 2010, Pablo Neira Ayuso wrote: > >> On 27/11/10 21:42, Jozsef Kadlecsik wrote: >>> On Sat, 27 Nov 2010, Jan Engelhardt wrote: >>> >>>> On Saturday 2010-11-27 18:04, Jozsef Kadlecsik wrote: >>>>> >>>>> AFAIK when the kernel dumps and the skb is full, it's not returned >>>>> directly to the userspace but first enqueued. >>>> >>>> I don't recognize that inside the code however. >>>> >>>> In netlink_dump(), there is the cb->dump call. There are no loops >>>> inside this function. Neither are there in the two parents, >>>> netlink_dump_start() and netlink_recvmsg(). >>> >>> In netlink_dump() after the call to cb->dump, you can see the call to >>> skb_queue_tail. So the message is queued. >>> >>> Where the looping happens, I do not know. Some socket magic? >> >> 1) you send a NLM_F_DUMP request. >> 2) the kernel fills one skb and enqueue it into the socket buffer. >> 3) the process invokes recvmsg(), it gets the datagram, then go back to step >> 2). >> >> Thus, the dump only consumes 1 memory page per recv() invocation. That's the >> magic. > > So Jan has got right: if the process which initiated the dumping is > suspended and locking is used, then the suspended process locks out all > other processes. We may use also some optimistic locking approach: * We assume that there's an ID for every table. * That ID is increased if you perform some modification in the rule-set of that table. * That ID has to be included as an attribute. * If the ID changes in the middle of one dump, you restart the dump of that table since the beginning. * Once you start receiving information from a different table, you can consider that the previous table has been fully dumped. For the last table, you can take the NLM_F_DONE as trailing. The user-space application has to keep the entries in a list until that table has been fully dumped, if it notices that the ID increases, it releases previous entries and get new ones. This means that the iptables-save command based on netlink does not write the entries into the disk straight forward, instead it keeps the rules for that table in the list until the dump is finished. Then, it writes them to the disk (so we make sure there are no duplicated entries). Optimistic approaches have one problem, if the rule-set is modified during the dump quite so often, it may keep dumping indefinitely. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html