On Tue, Nov 04, 2014 at 04:01:54PM -0200, Marcelo Ricardo Leitner wrote: > >>@@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net, > >> > >> inst->qlen++; > >> > >>- __build_packet_message(log, inst, skb, data_len, pf, > >>- hooknum, in, out, prefix, plen); > >>+ if (__build_packet_message(log, inst, skb, data_len, pf, > >>+ hooknum, in, out, prefix, plen)) > >>+ goto build_failure; > >> > >> if (inst->qlen >= qthreshold) > >> __nfulnl_flush(inst); > >>@@ -726,6 +726,15 @@ unlock_and_release: > >> instance_put(inst); > >> return; > >> > >>+build_failure: > >>+ /* If no other messages in it, we're good to free it. */ > >>+ if (!inst->skb->len) { > >>+ kfree_skb(inst->skb); > >>+ inst->skb = NULL; > >>+ } > >>+ > >>+ inst->qlen--; > > > >For each message that we put into the batch, we increase inst->qlen in > >one, so I think this decrement isn't enough to leave things in > >consistent state. If we at least have one message already in the > >batch, I think it's good to give it a try to deliver it to userspace: > > > > if (inst->skb) > > __nfulnl_flush(inst); > > The idea was to undo just the last attempt and leave the older ones > where they were... > > Currently we: > - allocate skb, if needed > - increment inst->qlen > - call __build_packet_message > - if (inst->qlen >= qthreshold) > we flush.. > > Considering __build_packet_message now will call nlmsg_cancel and > undo all its work on fails, I thought on just dec'ing inst->qlen and > releasing the skbuff, if it's empty. > > I see your point on attempting the flush, good idea. Otherwise we > could hit a situation that it would be stuck on undoing the last > message and this situation would be fixed only after inst->timer > expires. > > It could be like: > build_failure: > inst->qlen--; We can inst->qlen++ after build_packet_message, so you can skip this line above. > /* If no other messages in it, we're good to free it. */ > if (!inst->skb->len) { Now I'd suggest: if (inst->qlen == 0) { > kfree_skb(inst->skb); > inst->skb = NULL; > } > else { > __nfulnl_flush(inst); > } > > What do you think? > > >Then, the WARN_ON that Florian added recently should catch that we > >have size miscalculations from __nfulnl_send(). > > Good point. It may not catch it always. If the failed message was > bigger than the DONE message, it may go on unnoticed. Actually, even > if it warns, we would have no idea that a message was dropped a bit > earlier, as the stats aren't there yet. Maybe another WARN_ONCE on > build_failure label? That will catch possible miscalculations too, I don't like we're polluting this with many WARN_ONCE, but I don't see any better way to catch this size miscalculation bugs, so ahead add it. BTW, we should also signal the userspace when we fail to build the message via: nfnetlink_set_err(net, 0, group, -ENOBUFS); so it knows that we're losing log messages for whatever reason. Basically, userspace hits -ENOBUFS when calling recv(), which means netlink is losing messages. I don't think we really need the statistics. I can also see this line: nlh->nlmsg_len = inst->skb->tail - old_tail; that can be replaced by nlmsg_end(skb, nlh); It seems this code needs some care. -- To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html