On Wed, Nov 23, 2022 at 09:39:06PM +0100, Devid Antonio Filoni wrote: > On Mon, 2022-11-21 at 06:19 +0100, Oleksij Rempel wrote: > > On Sun, Nov 20, 2022 at 08:18:32PM +0100, Devid Antonio Filoni wrote: > > > On Sun, 2022-11-20 at 09:45 +0100, Oleksij Rempel wrote: > > > > On Sun, Nov 20, 2022 at 01:11:52AM +0100, Devid Antonio Filoni wrote: > > > > > On Sat, 2022-11-19 at 11:12 +0100, Oleksij Rempel wrote: > > > > > > On Fri, Nov 18, 2022 at 04:12:40PM +0100, Devid Antonio Filoni wrote: > > > > > > > Hi Oleksij, > > > > > > > > > > > > > > honestly I would apply proposed patch because it is the easier solution > > > > > > > and makes the driver compliant with the standard for the following > > > > > > > reasons: > > > > > > > - on the first claim, the kernel will wait 250 ms as stated by the > > > > > > > standard > > > > > > > + on successive claims with the same name, the kernel will not wait > > > > > > > 250ms, this implies: > > > > > > > - it will not wait after sending the address-claimed message when the > > > > > > > claimed address has been spoofed, but the standard does not explicitly > > > > > > > states what to do in this case (see previous emails in this thread), so > > > > > > > it would be up to the application developer to decide how to manage the > > > > > > > conflict > > > > > > > - it will not wait after sending the address-claimed message when a > > > > > > > request for address-claimed message has been received as stated by the > > > > > > > standard > > > > > > > > > > > > Standard says: > > > > > > 1. No CF _shall_ begin, or resume, transmission on the network until 250 ms > > > > > > after it has successfully claimed an address (Figure 4). > > > > > > 2. This does not apply when responding to a request for address claimed. > > > > > > > > > > > > With current patch state: 1. is implemented and working as expected, 2. > > > > > > is not implemented. > > > > > > With this patch: 1. is partially broken and 2. is partially faking > > > > > > needed behavior. > > > > > > > > > > > > It will not wait if remote ECU which rebooted for some reasons. With this patch > > > > > > we are breaking one case of the standard in favor to fake compatibility to the > > > > > > other case. We should avoid waiting only based on presence of RfAC not based > > > > > > on the old_addr == new_addr. > > > > > > > > > > I'm sorry, I don't think I understood the point about reboot ("It will > > > > > not wait if remote ECU which rebooted for some reasons"). If another ECU > > > > > rebooted, then *it* will have to perform the claim procedure again > > > > > waiting 250 ms before beginning the transmission. Your ECU doesn't have > > > > > to check if the other ECUs respected the 250 ms wait. > > > > > > > > With proposed patch: > > > > - local application which is sending to the remote NAME, will start or continue > > > > communication with ECU which should stay silent. > > > > > > And this is not forbidden by the standard, the standard states that the > > > remote ECU shall not start or continue the communication but it can > > > *receive* messages. > > > For example, what would you do if: > > > - during the 250 ms wait, another ECU sends a request-for-address- > > > claimed message meant to the address you're claiming? > > > From "4.5.3 Other requirements for initialization": > > > A CF shall respond to a request-for-address-claimed message when the > > > destination address is the same as the CF's address and shall transmit > > > its response to the Global address (255). > > > - during the 250 ms wait another ECU sends a normal message (non > > > address-claim related) using the SA you're currently claiming? > > > > > > > - local application which was manually or automatically restarted (see > > > > application watchdogs), will bypass address claim procedure > > > > completion and start sending without 250ms delay. > > > > > > Then the application will be violating the standard, you're right, > > > however please note that, as per driver implementation, each time the > > > socket is closed and opened again (if bound with a name) you have to > > > send the address-claimed message again. > > > The standard also states how to treat this kind of violations on the > > > remote ECU side. > > > > > > > > > > > > Also, the ISO11783-5 standard, with "Figure 6 (Resolving address > > > > > contention between two self-configurable-address CF)" of "4.5.4.2 - > > > > > Address-claim prioritization", shows that: > > > > > - ECU1 claims the address (time: 0 ms) > > > > > - ECU2 claims the same address (time: 0+x ms) > > > > > - ECU1 NAME has the higher priority, so ECU1 sends again the address > > > > > claimed message as soon as it received the address-claim from ECU2 > > > > > (time: 0+x+y ms) > > > > > - ECU1 starts normal transmission (time: 250 ms) > > > > > With current implementation, the ECU1 would start the transmission at > > > > > time 0+x+y+250 ms, with proposed patch it would not. > > > > > > > > You are right, this should be fixed. > > > > But proposed patch closes one issues and opens another, with this patch it will > > > > be enough to send at least two address claimed messages to bypass the delay. > > > > > > No, because the timer associated with the first claim *is not stopped*. > > > > > > > > > > > > Same is showed in "Figure 7 (Resolving address contention between a non- > > > > > configurable address CF and a self-configurable address CF)", the ECU > > > > > waits again 250 ms only when claiming a different address. > > > > > > > > Ack > > > > > > > > > Also, as previously discussed in this thread, the standard states in > > > > > 4.4.4.3 - Address violation: > > > > > If a CF receives a message, other than the address-claimed message, > > > > > which uses the CF's own SA, > > > > > then the CF: > > > > > - shall send the address-claim message to the Global address; > > > > > - shall activate a diagnostic trouble code with SPN = 2000+SA and FMI = > > > > > 31 > > > > > It is not *explicitly* stated that you have to wait 250 ms after the > > > > > address-claim message has been sent. > > > > > > > > There is no need to explicitly state it. The requirement is clearly described > > > > in the 4.5.2.d part 1 with clearly defined exception in 4.5.2.d part 2. > > > > If something is not explicitly stated, the stated requirement has always > > > > priority. > > > > > > > > > Please note that the 250 ms wait is mentioned only in "4.5 - Network > > > > > initialization" > > > > > > > > OK, we need to refer to the wording used in a specifications, in > > > > general: > > > > Shall – Shall is used to designate a mandatory requirement. > > > > Should – Should is used for requirements that are considered good and are > > > > recommended, but are not absolutely mandatory. > > > > May – May is used to for requirements that are optional. > > > > > > > > If a requirement with strong wording as "shall" is not strong enough for > > > > you and you are suing words as ".. mentioned only in .." then even a > > > > statistical analysis of this spec will have no meaning. In all > > > > cases we can just invalidate all arguments by using: it is only X or Y. > > > > > > > > > while above statements come from "4.4 - Network-management procedures". > > > > > Also in this case, the proposed patch is still standard compliant. > > > > > > > > If we remove 4.5.2.d from the spec, then yes. > > > > > > > > > So I'm sorry but I have to disagree with you, there are many things > > > > > broken in the current implementation because it is forcing the 250 wait > > > > > to all cases but it should not. > > > > > > > > If we remove 4.5.2.d from the spec, then yes. Every construction is > > > > logical if we adopt input variables to the construction. > > > > > > From "4.4.4.3 - Address violation": > > > - *shall send the address-claim message* to the Global address > > > From "4.5.2 Address claim requirements": > > > - No CF shall begin, or resume, transmission on the network until 250 ms > > > after it has successfully *claimed an address*, except when responding > > > to a request for address-claimed. > > > > > > Do you see any difference? > > > With your interpretation of the standard, then above 4.5.2.d sentence > > > shall be: > > > - No CF shall begin, or resume, transmission on the network until 250 ms > > > after it has successfully *sent the address-claim message*, except when > > > responding to a request for address-claimed. > > > > > > I think "it has successfully claimed an address" is valid for the whole > > > claim procedure and not for the address-claimed message only. > > > > > > Please note that the ECU shall send the address-claim message also when > > > it receives a request for a matching NAME ("4.4.3.2 NAME management (NM) > > > message"). This does not mean that is claiming again the address. > > > > > > > > > > > > > Without words 2. part should be implemented without breaking 1. > > > > > > > > > > > > > Otherwise you will have to keep track of above cases and decide if the > > > > > > > wait is needed or not, but this is hard do accomplish because is the > > > > > > > application in charge of sending the address-claimed message, so you > > > > > > > would have to decide how much to keep track of the request for address- > > > > > > > claimed message thus adding more complexity to the code of the driver. > > > > > > > > > > > > Current kernel already tracks all claims on the bus and knows all registered > > > > > > NAMEs. I do not see increased complicity in this case. > > > > > > > > > > The kernel tracks the claims but it does *not track* incoming requests > > > > > for address-claimed message, it would have to and it would have to > > > > > > > > yes > > > > > > > > > allow the application to answer to it *within a defined time window*. > > > > > > > > yes. > > > > > > > > > But keep in mind that there are other cases when the 250 ms wait is wrong > > > > > or it is not explicitly stated by the standard. > > > > > > > > If it is not stated in the standard how can we decide if it is wrong? > > > And how can we decide if it is right? :) > > > > > > > And if strongly worded statements have no value just because it is > > > > stated only one time, how proper standard should look like? > > > See above. > > > > > > > > > > > > > IMHO, only missing part i a user space interface. Some thing like "ip n" > > > > > > will do. > > > > > > > > > > > > > Another solution is to let the driver send the address-claimed message > > > > > > > waiting or without waiting 250 ms for successive messages depending on > > > > > > > the case. > > > > > > > > > > > > You can send "address-claimed message" in any time you wont. Kernel will > > > > > > just not resolve the NAME to address until 1. part of the spec will > > > > > > apply. Do not forget, the NAME cache is used for local _and_ remote > > > > > > names. You can trick out local system, not remote. > > > > > > > > > > > > Even if you implement "smart" logic in user space and will know better > > > > > > then kernel, that this application is responding to RfAC. You will newer > > > > > > know if address-claimed message of remote system is a response to RfAC. > > > > > > > > > > > > From this perspective, I do not know, how allowing the user space break > > > > > > the rules will help to solve the problem? > > > > > > > > > > I think you did not understand this last proposal: since the driver is > > > > > already implementing part of the standard, then it might as well send > > > > > the address-claimed message when needed and wait 250 ms or not depending > > > > > on the case. > > > > > > > > Let's try following test: > > > > j1939acd -r 80 -c /tmp/1122334455667788.jacd 11223344556677 vcan0 & > > > > while(true); do testj1939 -s8 vcan0:0x80 :0x90,0x12300; done > > > > > > > > And start candump with delta time stamps: > > > > :~ candump -t d vcan0 > > > > (000.000000) vcan0 18EAFFFE [3] 00 EE 00 > > > > (000.002437) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF <---- no 250ms delay > > > > (000.011458) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.011964) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.011712) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012585) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012891) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012082) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012604) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012357) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012790) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012765) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012483) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012680) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012144) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > ... snip ... > > > > (000.012592) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012515) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.013183) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012653) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.011886) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012836) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.009494) vcan0 18EEFF80 [8] 77 66 55 44 33 22 11 00 <---- SA 0x80 address claimed > > > > (000.003362) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF <---- next packet from SA 0x80 3 usecs after previous. No 250ms delay. > > > > (000.012351) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012983) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012602) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012594) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.012348) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > (000.011922) vcan0 19239080 [8] 01 23 45 67 89 AB CD EF > > > > > > > > As you can see, the j1939 stack do not forcing application to use NAMEs and > > > > do not preventing sending any message withing 250ms delay. The only thing > > > > what has the 250 timer is NAME to address resolution which should be fixed in > > > > respect of 4.5.2.d without breaking every thing else. > > > > > > Yes this is clear, this is working because the socket used by testj1939 > > > is not bound to any name. > > > > > > Just to clarify: are you suggesting to applications developer to use one > > > socket (bound with the name) to manage the address-claim and another one > > > (bound without the name) for other transmissions? If so, then why that > > > code exists in the driver? > > > Honestly I would consider this proposal really bad since this would > > > allow to completely violate the standard. I really hope you agree with > > > me about this. > > > > Hm... you are right. > > > > Please add to your patch code comments with standard snippets and > > clarification why it should be so. Commit comment will be often > > overseen. > > > > Regards, > > Oleksij > > Would the following comment be acceptable? Isn't it too long? > > The ISO 11783-5 standard, in "4.5.2 - Address claim requirements", > states: > d) No CF shall begin, or resume, transmission on the network until 250 > ms after it has successfully claimed an address except when > responding to a request for address-claimed. > But "Figure 6" and "Figure 7" in "4.5.4.2 - Address-claim > prioritization" show that the CF begin the transmission after 250 ms > from the first AC (address-claimed) message even if it sends another AC > message during that time window to resolve the address contention with > another CF. > As stated in "4.4.2.3 - Address-claimed message": > In order to successfully claim an address, the CF sending an address > claimed message shall not receive a contending claim from another CF > for at least 250 ms. > As stated in "4.4.3.2 - NAME management (NM) message": > 1) A commanding CF can > d) request that a CF with a specified NAME transmit the address- > claimed message with its current NAME. > 2) A target CF shall > d) send an address-claimed message in response to a request for a > matching NAME > Taking the above arguments into account, the 250 ms wait is requested > only during network initialization. > Do not restart the timer on AC message if both the NAME and the address > match and therefore if the address has already been claimed (timer has > expired) or the AC message has been sent to resolve the contention with > another CF (timer is still running). Sounds good. Regards, Oleksij -- Pengutronix e.K. | | Steuerwalder Str. 21 | http://www.pengutronix.de/ | 31137 Hildesheim, Germany | Phone: +49-5121-206917-0 | Amtsgericht Hildesheim, HRA 2686 | Fax: +49-5121-206917-5555 |