Hi Guenter, you always can provide help very quickly, thank you very much :-) On 23 June 2015 at 23:21, Guenter Roeck <linux@xxxxxxxxxxxx> wrote: > On Tue, Jun 23, 2015 at 09:26:35PM +0800, Fu Wei wrote: >> Hi Guenter, > [ ...] > >> > >> >> + * When the first timeout occurs, WS0(SPI or LPI) is triggered, >> >> + * the second timeout period(as long as the first timeout period) starts. >> > >> > no longer accurate if WOR is used for the second period. >> > >> >> + * In WS0 interrupt routine, panic() will be called for collecting >> >> + * crashdown info. >> >> + * If system can not recover from WS0 interrupt routine, then second >> >> + * timeout occurs, WS1(reset or higher level interrupt) is triggered. >> >> + * The two timeout period can be set by WOR(32bit). >> > >> > The second timeout period is determined by ... >> > >> >> + * WOR gives a maximum watch period of around 10s at the maximum >> >> + * system counter frequency. >> >> + * The System Counter shall run at maximum of 400MHz. >> > >> > "... at the maximum system counter frequency of 400 MHz.", and drop the >> > last sentence. >> >> For the second timeout period, I have discussed with a kdump developers, >> (1)10s maybe not good enough for all the case of panic + kdump, so >> maybe we still need to use WCV in the second timeout period >> (2)in the second timeout period, maybe we need to programme WCV for >> two reason: a, trigger WS1 to reboot system ASAP; b, feed the watchdog >> without cleanning WS0 flag. >> >> WHY we want to feed the watchdog (keepalive) without cleanning WS0 flag?? >> REASON: >> (1)if the system context is large, we may need to feed the dog until >> we get all the things backed up. >> (2)if system goes wrong, WS0 triggered, then panic--> kdump. if we >> feed the dog by WRR or programming WOR, WS0 flag will be cleaned. Once >> system goes wrong again, then panic again..... >> So this system will be in a panic--kdump--panic--kdump loop, have not >> chance to reset. >> >> So if we are in the second timeout period, we may need to always programme WCV. >> > The crashdump kernel is supposed to reload the watchdog driver, which will ping > the watchdog. If it isn't able to do that in 10 seconds, something is wrong. yes, 10s maybe not enough for all case. When I tested kdump on arm64, sometimes , it took 20s. So I am thinking : can we make the max value of pretimeout > 10s in this driver. > >> >> + >> >> + status = readl_relaxed(gwdt->control_base + SBSA_GWDT_WCS); >> >> + if (status & SBSA_GWDT_WCS_WS1) { >> >> + dev_warn(dev, "System reset by WDT(WCV: %llx)\n", >> >> + sbsa_gwdt_get_wcv(wdd)); >> > >> > WCV here only tells us how many clock cycles were executed since the >> > system started (or something like that). So I still don't understand >> > why it is valuable to print that number. >> >> this number provides the time of system reset, I thinks that may help >> admin to analyse the system failure. >> > It doesn't mean anything to anyone but you since it is not in a well defined > time scale. maybe I should convert it to second? I think the original value is better? > Also, I would be somewhat surprised if WCV would retain its value > on reset. Much more likely it is the time (in clock cycles) since reset. yes, It has been mentioned in SBSA: --------------------- If WS0 is asserted and a timeout refresh occurs then the following must occur: If the system is compliant to SBSA level 0 or level 1 then it is IMPLEMENTATION DEFINED as to whether the compare value is loaded with the sum of the zero-extended watchdog offset register and the current generic timer system count value, or whether it retains its current value. If the system is compliant to SBSA level 2 or higher the compare value must retain its current value. This means that the compare value records the time that WS1 is asserted. --------------------- Hope I understand it correctly. please let me know , if I misunderstand something, thanks > > Guenter -- Best regards, Fu Wei Software Engineer Red Hat Software (Beijing) Co.,Ltd.Shanghai Branch Ph: +86 21 61221326(direct) Ph: +86 186 2020 4684 (mobile) Room 1512, Regus One Corporate Avenue,Level 15, One Corporate Avenue,222 Hubin Road,Huangpu District, Shanghai,China 200021 -- To unsubscribe from this list: send the line "unsubscribe linux-doc" in the body of a message to majordomo@xxxxxxxxxxxxxxx More majordomo info at http://vger.kernel.org/majordomo-info.html