Re: [PATCH v4 2/3] Do error handling if __build_packet_message fails

Pablo Neira Ayuso <pablo@xxxxxxxxxxxxx> · Tue, 4 Nov 2014 19:26:09 +0100

On Tue, Nov 04, 2014 at 04:01:54PM -0200, Marcelo Ricardo Leitner wrote:
> >>@@ -708,8 +707,9 @@ nfulnl_log_packet(struct net *net,
> >>
> >>  	inst->qlen++;
> >>
> >>-	__build_packet_message(log, inst, skb, data_len, pf,
> >>-				hooknum, in, out, prefix, plen);
> >>+	if (__build_packet_message(log, inst, skb, data_len, pf,
> >>+				   hooknum, in, out, prefix, plen))
> >>+		goto build_failure;
> >>
> >>  	if (inst->qlen >= qthreshold)
> >>  		__nfulnl_flush(inst);
> >>@@ -726,6 +726,15 @@ unlock_and_release:
> >>  	instance_put(inst);
> >>  	return;
> >>
> >>+build_failure:
> >>+	/* If no other messages in it, we're good to free it. */
> >>+	if (!inst->skb->len) {
> >>+		kfree_skb(inst->skb);
> >>+		inst->skb = NULL;
> >>+	}
> >>+
> >>+	inst->qlen--;
> >
> >For each message that we put into the batch, we increase inst->qlen in
> >one, so I think this decrement isn't enough to leave things in
> >consistent state. If we at least have one message already in the
> >batch, I think it's good to give it a try to deliver it to userspace:
> >
> >         if (inst->skb)
> >                 __nfulnl_flush(inst);
> 
> The idea was to undo just the last attempt and leave the older ones
> where they were...
>
> Currently we:
> - allocate skb, if needed
> - increment inst->qlen
> - call __build_packet_message
> - if (inst->qlen >= qthreshold)
>   we flush..
> 
> Considering __build_packet_message now will call nlmsg_cancel and
> undo all its work on fails, I thought on just dec'ing inst->qlen and
> releasing the skbuff, if it's empty.
>
> I see your point on attempting the flush, good idea. Otherwise we
> could hit a situation that it would be stuck on undoing the last
> message and this situation would be fixed only after inst->timer
> expires.
> 
> It could be like:
> build_failure:
>     inst->qlen--;

We can inst->qlen++ after build_packet_message, so you can skip this
line above.

>     /* If no other messages in it, we're good to free it. */
>     if (!inst->skb->len) {

Now I'd suggest:

      if (inst->qlen == 0) {

>         kfree_skb(inst->skb);
>         inst->skb = NULL;
>     }
>     else {
>         __nfulnl_flush(inst);
>     }
> 
> What do you think?
> 
> >Then, the WARN_ON that Florian added recently should catch that we
> >have size miscalculations from __nfulnl_send().
> 
> Good point. It may not catch it always. If the failed message was
> bigger than the DONE message, it may go on unnoticed. Actually, even
> if it warns, we would have no idea that a message was dropped a bit
> earlier, as the stats aren't there yet. Maybe another WARN_ONCE on
> build_failure label?

That will catch possible miscalculations too, I don't like we're
polluting this with many WARN_ONCE, but I don't see any better way to
catch this size miscalculation bugs, so ahead add it.

BTW, we should also signal the userspace when we fail to build the
message via:

nfnetlink_set_err(net, 0, group, -ENOBUFS);

so it knows that we're losing log messages for whatever reason.
Basically, userspace hits -ENOBUFS when calling recv(), which means
netlink is losing messages. I don't think we really need the
statistics.

I can also see this line:

nlh->nlmsg_len = inst->skb->tail - old_tail;

that can be replaced by nlmsg_end(skb, nlh);

It seems this code needs some care.
--
To unsubscribe from this list: send the line "unsubscribe netfilter-devel" in
the body of a message to majordomo@xxxxxxxxxxxxxxx
More majordomo info at  http://vger.kernel.org/majordomo-info.html