RE: [PATCH v1] ptp: add ToD device driver for Intel FPGA cards

"Zhang, Tianfei" <tianfei.zhang@xxxxxxxxx> · Thu, 16 Mar 2023 15:03:55 +0000

> -----Original Message-----
> From: Zhang, Tianfei
> Sent: Wednesday, March 15, 2023 10:59 AM
> To: Marco Pagani <marpagan@xxxxxxxxxx>
> Cc: ilpo.jarvinen@xxxxxxxxxxxxxxx; andriy.shevchenko@xxxxxxxxxxxxxxx; Weight,
> Russell H <russell.h.weight@xxxxxxxxx>; matthew.gerlach@xxxxxxxxxxxxxxx; pierre-
> louis.bossart@xxxxxxxxxxxxxxx; Gomes, Vinicius <vinicius.gomes@xxxxxxxxx>;
> Khadatare, RaghavendraX Anand <raghavendrax.anand.khadatare@xxxxxxxxx>;
> netdev@xxxxxxxxxxxxxxx; linux-fpga@xxxxxxxxxxxxxxx; richardcochran@xxxxxxxxx
> Subject: RE: [PATCH v1] ptp: add ToD device driver for Intel FPGA cards
> 
> 
> 
> > -----Original Message-----
> > From: Marco Pagani <marpagan@xxxxxxxxxx>
> > Sent: Tuesday, March 14, 2023 1:50 AM
> > To: Zhang, Tianfei <tianfei.zhang@xxxxxxxxx>
> > Cc: ilpo.jarvinen@xxxxxxxxxxxxxxx; andriy.shevchenko@xxxxxxxxxxxxxxx;
> > Weight, Russell H <russell.h.weight@xxxxxxxxx>;
> > matthew.gerlach@xxxxxxxxxxxxxxx; pierre-
> > louis.bossart@xxxxxxxxxxxxxxx; Gomes, Vinicius
> > <vinicius.gomes@xxxxxxxxx>; Khadatare, RaghavendraX Anand
> > <raghavendrax.anand.khadatare@xxxxxxxxx>;
> > netdev@xxxxxxxxxxxxxxx; linux-fpga@xxxxxxxxxxxxxxx;
> > richardcochran@xxxxxxxxx
> > Subject: Re: [PATCH v1] ptp: add ToD device driver for Intel FPGA
> > cards
> >
> >
> >
> > On 2023-03-13 04:02, Tianfei Zhang wrote:
> > > Adding a DFL (Device Feature List) device driver of ToD device for
> > > Intel FPGA cards.
> > >
> > > The Intel FPGA Time of Day(ToD) IP within the FPGA DFL bus is
> > > exposed as PTP Hardware clock(PHC) device to the Linux PTP stack to
> > > synchronize the system clock to its ToD information using phc2sys
> > > utility of the Linux PTP stack. The DFL is a hardware List within
> > > FPGA, which defines a linked list of feature headers within the
> > > device MMIO space to provide an extensible way of adding subdevice features.
> > >
> > > Signed-off-by: Raghavendra Khadatare
> > > <raghavendrax.anand.khadatare@xxxxxxxxx>
> > > Signed-off-by: Tianfei Zhang <tianfei.zhang@xxxxxxxxx>
> > > ---
> > >  MAINTAINERS               |   7 +
> > >  drivers/ptp/Kconfig       |  13 ++
> > >  drivers/ptp/Makefile      |   1 +
> > >  drivers/ptp/ptp_dfl_tod.c | 334
> > > ++++++++++++++++++++++++++++++++++++++
> > >  4 files changed, 355 insertions(+)
> > >  create mode 100644 drivers/ptp/ptp_dfl_tod.c
> > >
> > > diff --git a/MAINTAINERS b/MAINTAINERS index
> > > ec57c42ed544..584979abbd92 100644
> > > --- a/MAINTAINERS
> > > +++ b/MAINTAINERS
> > > @@ -15623,6 +15623,13 @@ L:	netdev@xxxxxxxxxxxxxxx
> > >  S:	Maintained
> > >  F:	drivers/ptp/ptp_ocp.c
> > >
> > > +INTEL PTP DFL ToD DRIVER
> > > +M:	Tianfei Zhang <tianfei.zhang@xxxxxxxxx>
> > > +L:	linux-fpga@xxxxxxxxxxxxxxx
> > > +L:	netdev@xxxxxxxxxxxxxxx
> > > +S:	Maintained
> > > +F:	drivers/ptp/ptp_dfl_tod.c
> > > +
> > >  OPENCORES I2C BUS DRIVER
> > >  M:	Peter Korsgaard <peter@xxxxxxxxxxxxx>
> > >  M:	Andrew Lunn <andrew@xxxxxxx>
> > > diff --git a/drivers/ptp/Kconfig b/drivers/ptp/Kconfig index
> > > fe4971b65c64..e0d6f136ee46 100644
> > > --- a/drivers/ptp/Kconfig
> > > +++ b/drivers/ptp/Kconfig
> > > @@ -186,4 +186,17 @@ config PTP_1588_CLOCK_OCP
> > >
> > >  	  More information is available at http://www.timingcard.com/
> > >
> > > +config PTP_DFL_TOD
> > > +	tristate "FPGA DFL ToD Driver"
> > > +	depends on FPGA_DFL
> > > +	help
> > > +	  The DFL (Device Feature List) device driver for the Intel ToD
> > > +	  (Time-of-Day) device in FPGA card. The ToD IP within the FPGA
> > > +	  is exposed as PTP Hardware Clock (PHC) device to the Linux PTP
> > > +	  stack to synchronize the system clock to its ToD information
> > > +	  using phc2sys utility of the Linux PTP stack.
> > > +
> > > +	  To compile this driver as a module, choose M here: the module
> > > +	  will be called ptp_dfl_tod.
> > > +
> > >  endmenu
> > > diff --git a/drivers/ptp/Makefile b/drivers/ptp/Makefile index
> > > 28a6fe342d3e..553f18bf3c83 100644
> > > --- a/drivers/ptp/Makefile
> > > +++ b/drivers/ptp/Makefile
> > > @@ -18,3 +18,4 @@ obj-$(CONFIG_PTP_1588_CLOCK_IDTCM)	+=
> > ptp_clockmatrix.o
> > >  obj-$(CONFIG_PTP_1588_CLOCK_IDT82P33)	+= ptp_idt82p33.o
> > >  obj-$(CONFIG_PTP_1588_CLOCK_VMW)	+= ptp_vmw.o
> > >  obj-$(CONFIG_PTP_1588_CLOCK_OCP)	+= ptp_ocp.o
> > > +obj-$(CONFIG_PTP_DFL_TOD)		+= ptp_dfl_tod.o
> > > diff --git a/drivers/ptp/ptp_dfl_tod.c b/drivers/ptp/ptp_dfl_tod.c
> > > new file mode 100644 index 000000000000..d9fbdfc53bd8
> > > --- /dev/null
> > > +++ b/drivers/ptp/ptp_dfl_tod.c
> > > @@ -0,0 +1,334 @@
> > > +// SPDX-License-Identifier: GPL-2.0-only
> > > +/*
> > > + * DFL device driver for Time-of-Day (ToD) private feature
> > > + *
> > > + * Copyright (C) 2023 Intel Corporation  */
> > > +
> > > +#include <linux/bitfield.h>
> > > +#include <linux/delay.h>
> > > +#include <linux/dfl.h>
> > > +#include <linux/gcd.h>
> > > +#include <linux/iopoll.h>
> > > +#include <linux/module.h>
> > > +#include <linux/ptp_clock_kernel.h> #include <linux/spinlock.h>
> > > +#include <linux/units.h>
> > > +
> > > +#define FME_FEATURE_ID_TOD		0x22
> > > +
> > > +/* ToD clock register space. */
> > > +#define TOD_CLK_FREQ			0x038
> > > +
> > > +/*
> > > + * The read sequence of ToD timestamp registers: TOD_NANOSEC,
> > > +TOD_SECONDSL and
> > > + * TOD_SECONDSH, because there is a hardware snapshot whenever the
> > > +TOD_NANOSEC
> > > + * register is read.
> > > + *
> > > + * The ToD IP requires writing registers in the reverse order to the read
> sequence.
> > > + * The timestamp is corrected when the TOD_NANOSEC register is
> > > +written, so the
> > > + * sequence of write TOD registers: TOD_SECONDSH, TOD_SECONDSL and
> > TOD_NANOSEC.
> > > + */
> > > +#define TOD_SECONDSH			0x100
> > > +#define TOD_SECONDSL			0x104
> > > +#define TOD_NANOSEC			0x108
> > > +#define TOD_PERIOD			0x110
> > > +#define TOD_ADJUST_PERIOD		0x114
> > > +#define TOD_ADJUST_COUNT		0x118
> > > +#define TOD_DRIFT_ADJUST		0x11c
> > > +#define TOD_DRIFT_ADJUST_RATE		0x120
> > > +#define PERIOD_FRAC_OFFSET		16
> > > +#define SECONDS_MSB			GENMASK_ULL(47, 32)
> > > +#define SECONDS_LSB			GENMASK_ULL(31, 0)
> > > +#define TOD_SECONDSH_SEC_MSB		GENMASK_ULL(15, 0)
> > > +
> > > +#define CAL_SECONDS(m, l)		((FIELD_GET(TOD_SECONDSH_SEC_MSB,
> > (m)) << 32) | (l))
> > > +
> > > +#define TOD_PERIOD_MASK		GENMASK_ULL(19, 0)
> > > +#define TOD_PERIOD_MAX			FIELD_MAX(TOD_PERIOD_MASK)
> > > +#define TOD_PERIOD_MIN			0
> > > +#define TOD_DRIFT_ADJUST_MASK		GENMASK_ULL(15, 0)
> > > +#define TOD_DRIFT_ADJUST_FNS_MAX
> > 	FIELD_MAX(TOD_DRIFT_ADJUST_MASK)
> > > +#define TOD_DRIFT_ADJUST_RATE_MAX	TOD_DRIFT_ADJUST_FNS_MAX
> > > +#define TOD_ADJUST_COUNT_MASK		GENMASK_ULL(19, 0)
> > > +#define TOD_ADJUST_COUNT_MAX
> > 	FIELD_MAX(TOD_ADJUST_COUNT_MASK)
> > > +#define TOD_ADJUST_INTERVAL_US		1000
> > > +#define TOD_ADJUST_MS			\
> > > +		(((TOD_PERIOD_MAX >> 16) + 1) * (TOD_ADJUST_COUNT_MAX + 1))
> > > +#define TOD_ADJUST_MS_MAX		(TOD_ADJUST_MS / MICRO)
> > > +#define TOD_ADJUST_MAX_US		(TOD_ADJUST_MS_MAX *
> > USEC_PER_MSEC)
> > > +#define TOD_MAX_ADJ			(500 * MEGA)
> > > +
> > > +struct dfl_tod {
> > > +	struct ptp_clock_info ptp_clock_ops;
> > > +	struct device *dev;
> > > +	struct ptp_clock *ptp_clock;
> > > +
> > > +	/* ToD Clock address space */
> > > +	void __iomem *tod_ctrl;
> > > +
> > > +	/* ToD clock registers protection */
> > > +	spinlock_t tod_lock;
> > > +};
> > > +
> > > +/*
> > > + * A fine ToD HW clock offset adjustment. To perform the fine
> > > +offset adjustment, the
> > > + * adjust_period and adjust_count argument are used to update the
> > > +TOD_ADJUST_PERIOD
> > > + * and TOD_ADJUST_COUNT register for in hardware. The dt->tod_lock
> > > +spinlock must be
> > > + * held when calling this function.
> > > + */
> >
> > Calling this function while holding the dt->tod_lock spinlock might
> > cause an error since read_poll_timeout() can sleep. If it is required
> > to use a spinlock, there is the
> > readl_poll_timeout_atomic() function, which can be called from atomic context.
> > However, in this case, it is probably better to use a mutex instead of
> > a spinlock since the delay appears to be 1 ms.
> 
> To program the TOD registers needs to be done without interruption to ensure the
> correct values are programmed, So I think using readl_poll_timeout_atomic()
> function here is better. This delay should be very faster, the 10 us interval delay will
> be better.
> 
> >
> > > +static int fine_adjust_tod_clock(struct dfl_tod *dt, u32 adjust_period,
> > > +				 u32 adjust_count)
> > > +{
> > > +	void __iomem *base = dt->tod_ctrl;
> > > +	u32 val;
> > > +
> > > +	writel(adjust_period, base + TOD_ADJUST_PERIOD);
> > > +	writel(adjust_count, base + TOD_ADJUST_COUNT);
> > > +
> > > +	/* Wait for present offset adjustment update to complete */
> > > +	return readl_poll_timeout(base + TOD_ADJUST_COUNT, val, !val,
> > TOD_ADJUST_INTERVAL_US,
> > > +				  TOD_ADJUST_MAX_US);
> > > +}
> > > +
> > > +/*
> > > + * A coarse ToD HW clock offset adjustment.
> > > + * The coarse time adjustment performs by adding or subtracting the
> > > +delta value
> > > + * from the current ToD HW clock time.
> > > + */
> > > +static int coarse_adjust_tod_clock(struct dfl_tod *dt, s64 delta) {
> > > +	u32 seconds_msb, seconds_lsb, nanosec;
> > > +	void __iomem *base = dt->tod_ctrl;
> > > +	u64 seconds, now;
> > > +
> > > +	if (delta == 0)
> > > +		return 0;
> > > +
> > > +	nanosec = readl(base + TOD_NANOSEC);
> > > +	seconds_lsb = readl(base + TOD_SECONDSL);
> > > +	seconds_msb = readl(base + TOD_SECONDSH);
> > > +
> > > +	/* Calculate new time */
> > > +	seconds = CAL_SECONDS(seconds_msb, seconds_lsb);
> > > +	now = seconds * NSEC_PER_SEC + nanosec + delta;
> > > +
> > > +	seconds = div_u64_rem(now, NSEC_PER_SEC, &nanosec);
> > > +	seconds_msb = FIELD_GET(SECONDS_MSB, seconds);
> > > +	seconds_lsb = FIELD_GET(SECONDS_LSB, seconds);
> > > +
> > > +	writel(seconds_msb, base + TOD_SECONDSH);
> > > +	writel(seconds_lsb, base + TOD_SECONDSL);
> > > +	writel(nanosec, base + TOD_NANOSEC);
> > > +
> > > +	return 0;
> > > +}
> > > +
> > > +static int dfl_tod_adjust_fine(struct ptp_clock_info *ptp, long
> > > +scaled_ppm) {
> > > +	struct dfl_tod *dt = container_of(ptp, struct dfl_tod, ptp_clock_ops);
> > > +	u32 tod_period, tod_rem, tod_drift_adjust_fns, tod_drift_adjust_rate;
> > > +	void __iomem *base = dt->tod_ctrl;
> > > +	unsigned long flags, rate;
> > > +	u64 ppb;
> > > +
> > > +	/* Get the clock rate from clock frequency register offset */
> > > +	rate = readl(base + TOD_CLK_FREQ);
> > > +
> > > +	/* add GIGA as nominal ppb */
> > > +	ppb = scaled_ppm_to_ppb(scaled_ppm) + GIGA;
> > > +
> > > +	tod_period = div_u64_rem(ppb << PERIOD_FRAC_OFFSET, rate, &tod_rem);
> > > +	if (tod_period > TOD_PERIOD_MAX)
> > > +		return -ERANGE;
> > > +
> > > +	/*
> > > +	 * The drift of ToD adjusted periodically by adding a drift_adjust_fns
> > > +	 * correction value every drift_adjust_rate count of clock cycles.
> > > +	 */
> > > +	tod_drift_adjust_fns = tod_rem / gcd(tod_rem, rate);
> > > +	tod_drift_adjust_rate = rate / gcd(tod_rem, rate);
> > > +
> > > +	while ((tod_drift_adjust_fns > TOD_DRIFT_ADJUST_FNS_MAX) ||
> > > +	       (tod_drift_adjust_rate > TOD_DRIFT_ADJUST_RATE_MAX)) {
> > > +		tod_drift_adjust_fns >>= 1;
> > > +		tod_drift_adjust_rate >>= 1;
> > > +	}
> >
> > Why both tod_drift_adjust_fns and tod_drift_adjust_rate are
> > iteratively divided if one of the two exceeds the maximum value?
> > Wouldn't it be more accurate to set each of them to the max if they exceeded their
> respective maximum value?
> 
> Thanks your point out, calculate the tod_drift_adjust_fns and tod_drift_adjust_rate
> separately is better.
> 

Sorry, my last comment is not correct,  I like to add more detail information about this code.

This piece of code adjusts for the non-representable part of the fractional nanosecond part of the target ToD period that corresponds to the nominal or servo-steered frequency.  For example, the clock period for a 10G application (156.25MHz) is 6.4ns. The hexadecimal representation of the integer part (6ns) is 0x6, and the fractional nanosecond part is 0.4 fns (fractional nanosecond), whose 16-bit hexadecimal representation is:

Fractional nanosecond field is 16 bits wide: 0.4 fns = 0.4 * 2^16 = 26214.4 in decimal.
Converting to hexadecimal: 26214 + 0.4 = 0x6666 + 0x0000.4 = 0x6666.4 fns.  

The 16bits register can represent 0x6666, but not 0.4fns. This would cause a systematic drift in the clock equal to 953.6 ns every 1 second.  To compensate for that, the hardware ToD module uses two registers, namely TOD_DRIFT_ADJUST and TOD_DRIFT_ADJUST_RATE. The non-representable 0.4 fns per clock cycle equals to 2fns per 5 clock cycles, both of which are integer and representable. 

Hence is the TOD_DRIFT_ADJUST  and TOD_DRIFT_ADJUST_RATE registers as follow:

TOD_DRIFT_ADJUST = 0x02
TOD_DRIFT_ADJUST_RATE = 0x5.

The while loop divides the higher-precision result of the divisions by 2 in sequential steps, until both tod_drift_adjust_fns and tod_drift_adjust_rate are representable within their hardware register precision and dynamic range. The while loop has a deterministic upper bound. 

Here is the ToD specification:
https://www.intel.com/content/www/us/en/docs/programmable/683044/21-4/adjusting-tod-drift.html