Hi Eddie, > Save any FFDC provided by the OCC driver, and provide it to userspace > through a binary sysfs entry. Do some basic state management to > ensure that userspace can always collect the data if there was an > error. Notify polling userspace when there is an error too. Super! Some comments inline: > +enum sbe_error_state { > + SBE_ERROR_NONE = 0, > + SBE_ERROR_PENDING, > + SBE_ERROR_COLLECTED > +}; > + > struct p9_sbe_occ { > struct occ occ; > + int sbe_error; Use the enum here? > + void *ffdc; > + size_t ffdc_len; > + size_t ffdc_size; > + struct mutex sbe_error_lock; /* lock access to ffdc data */ > + u32 no_ffdc_magic; > struct device *sbe; > }; > > #define to_p9_sbe_occ(x) container_of((x), struct p9_sbe_occ, occ) > > +static ssize_t sbe_error_read(struct file *filp, struct kobject *kobj, > + struct bin_attribute *battr, char *buf, > + loff_t pos, size_t count) > +{ > + ssize_t rc = 0; > + struct occ *occ = dev_get_drvdata(kobj_to_dev(kobj)); > + struct p9_sbe_occ *ctx = to_p9_sbe_occ(occ); > + > + mutex_lock(&ctx->sbe_error_lock); > + if (ctx->sbe_error == SBE_ERROR_PENDING) { > + rc = memory_read_from_buffer(buf, count, &pos, ctx->ffdc, > + ctx->ffdc_len); > + ctx->sbe_error = SBE_ERROR_COLLECTED; > + } > + mutex_unlock(&ctx->sbe_error_lock); > + > + return rc; > +} So any read from this file will clear out the FFDC data, making partial reads impossible. As a least-intrusive change, could we set SBE_ERROR_COLLECTED on write instead? Or is there a better interface (a pipe?) that allows multiple FFDC captures, destroyed on full consume, without odd read/write side effects? > rc = fsi_occ_submit(ctx->sbe, cmd, len, resp, &resp_len); > - if (rc < 0) > + if (rc < 0) { > + if (resp_len) { > + bool notify = false; > + > + mutex_lock(&ctx->sbe_error_lock); > + if (ctx->sbe_error != SBE_ERROR_PENDING) > + notify = true; > + ctx->sbe_error = SBE_ERROR_PENDING; [...] > + ctx->ffdc_len = resp_len; > + memcpy(ctx->ffdc, resp, resp_len); This will clear out the previous error it if hasn't been collected by userspace. Is that really what you want for *first* fail data capture? :) Cheers, Jeremy