Hello Al, Thank you for your extensive review. On Monday 11 November 2019 21:28:52 CET Al Viro wrote: > On Mon, Nov 11, 2019 at 01:51:33PM +0000, Jules Irenge wrote: > > > > > NAK. force-cast (and it's not a gcc extension, BTW - it's sparse) is basically > > > "I know better; the code is right, so STFU already". *IF* counters.count_... > > > is really little-endian 32bit, then why isn't it declared that way? And if > > > it's host-endian, you've just papered over a real bug here. > > > > > > As a general rule "fix" doesn't mean "tell it to shut up"... > > > > > > > Thanks for the comments, I have updated but I have a mixed mind on the > > __le32. I have to read more about it. > > > > I would appreciate if you can comment again on the update. > > From the look at the driver, it seems that all these counters are a part of > structure that is read from the hardware and only used as little-endian. > So just declare all fields in struct hif_mib_extended_count_table as > __le32; easy enough. Looking a bit further, the same goes for > struct hif_mib_bcn_filter_table ->num_of_info_elmts > struct hif_mib_keep_alive_period ->keep_alive_period (__le16) > struct hif_mib_template_frame ->frame_length (__le16) > struct hif_mib_set_association_mode ->basic_rate_set (__le32) > struct hif_req_update_ie ->num_i_es (__le16) > struct hif_req_write_mib ->mib_id, ->length (__le16 both) > struct hif_req_read_mib ->mib_id (__le16) > struct hif_req_configuration ->length (__le16) Indeed, structs declared in hif_api* are shared with the hardware and should use __le16/__le32. However, as you noticed below, these structs are sometime used in other parts of the code that are not related to the hardware. I have in my local queue a set of patches that improve the situation. Objective is to limit usage of hif structs to hif_tx.c, hif_tx_mib.c and hif_rx.c (which are correct places to handle hardware communication). I hope to be able to submit these patches in 2 weeks. [...] > and that's where the real bugs start to show up; leaving the misbegotten > forest of macros in misbegotten tracing shite aside, we have this: > > static const struct ieee80211_supported_band wfx_band_2ghz = { > .channels = wfx_2ghz_chantable, > .n_channels = ARRAY_SIZE(wfx_2ghz_chantable), > .bitrates = wfx_rates, > .n_bitrates = ARRAY_SIZE(wfx_rates), > .ht_cap = { > // Receive caps > .cap = IEEE80211_HT_CAP_GRN_FLD | IEEE80211_HT_CAP_SGI_20 | > IEEE80211_HT_CAP_MAX_AMSDU | (1 << IEEE80211_HT_CAP_RX_STBC_SHIFT), > .ht_supported = 1, > .ampdu_factor = IEEE80211_HT_MAX_AMPDU_16K, > .ampdu_density = IEEE80211_HT_MPDU_DENSITY_NONE, > .mcs = { > .rx_mask = { 0xFF }, // MCS0 to MCS7 > .rx_highest = 65, > drivers/staging/wfx/main.c:108:39: refering to this initializer. > Sparse say that it expects rx_highest to be __le16. And that's > not a driver-specific structure; it's generic ieee80211 one. Which > says > struct ieee80211_mcs_info { > u8 rx_mask[IEEE80211_HT_MCS_MASK_LEN]; > __le16 rx_highest; > u8 tx_params; > u8 reserved[3]; > } __packed; > and grepping for rx_highest through the tree shows that everything else > is treating it as little-endian 16bit. > > Almost certainly a bug on big-endian hosts; should be .rx_highest = cpu_to_le16(65), > instead. Agree. > Looking for more low-hanging fruits, we have > static int indirect_read32_locked(struct wfx_dev *wdev, int reg, u32 addr, u32 *val) > { > int ret; > __le32 *tmp = kmalloc(sizeof(u32), GFP_KERNEL); > > if (!tmp) > return -ENOMEM; > wdev->hwbus_ops->lock(wdev->hwbus_priv); > ret = indirect_read(wdev, reg, addr, tmp, sizeof(u32)); > *val = cpu_to_le32(*tmp); > _trace_io_ind_read32(reg, addr, *val); > wdev->hwbus_ops->unlock(wdev->hwbus_priv); > kfree(tmp); > return ret; > } > with warnings about val = cpu_to_le32(*tmp); fair enough, since *val is > host-endian (u32) and *tmp - little-endian. Trivial misannotation - > it should've been le32_to_cpu(), not cpu_to_le32(). Same mapping on > all CPUs we are ever likely to support, so it's just a misannotation, > not a bug per se. Agree. > drivers/staging/wfx/hif_tx_mib.h:34:38: warning: incorrect type in initializer (different base types) > drivers/staging/wfx/hif_tx_mib.h:34:38: expected unsigned char [usertype] wakeup_period_max > drivers/staging/wfx/hif_tx_mib.h:34:38: got restricted __le16 [usertype] > > is about > static inline int hif_set_beacon_wakeup_period(struct wfx_vif *wvif, > unsigned int dtim_interval, > unsigned int listen_interval) > { > struct hif_mib_beacon_wake_up_period val = { > .wakeup_period_min = dtim_interval, > .receive_dtim = 0, > .wakeup_period_max = cpu_to_le16(listen_interval), > }; > and struct hif_mib_beacon_wake_up_period has wakeup_period_max declared > as uint8_t. We are shoving a le16 value into it. Almost certain bug - > that will result in the listen_interval % 256 on litte-endian host and > listen_interval / 256 on big-endian one. Looking at the callers to > see what's actually passed as listen_interval shows only > hif_set_beacon_wakeup_period(wvif, wvif->dtim_period, wvif->dtim_period); > and dtim_period in *wvif (struct wfx_vif) can be assigned 0, 1 or > values coming from struct ieee80211_tim_ie ->dtim_period or > struct ieee80211_bss_conf ->dtim_period, 8bit in either structure. > > In other words, the value stored in val.wakeup_period_max will be > always zero on big-endian hosts. Definitely bogus, should just > store that (8bit) value as-is; cpu_to_le16() is wrong here. Absolutely agree. > Next piece of fun: > static inline int hif_beacon_filter_control(struct wfx_vif *wvif, > int enable, int beacon_count) > { > struct hif_mib_bcn_filter_enable arg = { > .enable = cpu_to_le32(enable), > .bcn_count = cpu_to_le32(beacon_count), > }; > return hif_write_mib(wvif->wdev, wvif->id, > HIF_MIB_ID_BEACON_FILTER_ENABLE, &arg, sizeof(arg)); > } > Sounds like ->enable and ->bcn_count should both be __le32, which makes > sense since the structs passed to hardware appear to be fixed-endian on > that thing. However, annotating them as such adds warnigns: > drivers/staging/wfx/sta.c:246:35: warning: incorrect type in assignment (different base types) > drivers/staging/wfx/sta.c:246:35: expected restricted __le32 [assigned] [usertype] bcn_count > drivers/staging/wfx/sta.c:246:35: got int > drivers/staging/wfx/sta.c:249:32: warning: incorrect type in assignment (different base types) > drivers/staging/wfx/sta.c:249:32: expected restricted __le32 [assigned] [usertype] enable > drivers/staging/wfx/sta.c:249:32: got int > drivers/staging/wfx/sta.c:253:32: warning: incorrect type in assignment (different base types) > drivers/staging/wfx/sta.c:253:32: expected restricted __le32 [assigned] [usertype] enable > drivers/staging/wfx/sta.c:253:32: got int > drivers/staging/wfx/sta.c:262:62: warning: incorrect type in argument 2 (different base types) > drivers/staging/wfx/sta.c:262:62: expected int enable > drivers/staging/wfx/sta.c:262:62: got restricted __le32 [assigned] [usertype] enable > drivers/staging/wfx/sta.c:262:78: warning: incorrect type in argument 3 (different base types) > drivers/staging/wfx/sta.c:262:78: expected int beacon_count > drivers/staging/wfx/sta.c:262:78: got restricted __le32 [assigned] [usertype] bcn_count > > All in the same function (wfx_update_filtering()) and we really do store > host-endian values in those (first 3 places). In the last one we pass > them to hif_beacon_filter_control(), which does expect host-endian. > And that's the only thing we do to the instance of hif_mib_bcn_filter_enable > in there... > > Possible solutions: > 1) store them little-endian there, pass to hif_beacon_filter_control() > already l-e, get rid of cpu_to_le32() in the latter. > 2) store them little-endian, pass the entire pointer to struct > instead of forming it again in hif_beacon_filter_control() > 3) don't pretend that the objects in hif_beacon_filter_control() > and in wfx_update_filtering() are of the same type (different layouts on > big-endian) and replace the one in the caller with two local variables. > My preference would be (3), as in [...] > but that's a matter of taste. Yes, this is one of the difficult parts. I work on it (I opted for solution 3). > Next is bx.c warning about __le32; that's about num_tx_count being fed to cpu_to_le32(). > grepping for that thing results in > drivers/staging/wfx/bh.c:106: release_count = le32_to_cpu(((struct hif_cnf_multi_transmit *)hif->body)->num_tx_confs); > drivers/staging/wfx/hif_api_cmd.h:316: uint32_t num_tx_confs; > drivers/staging/wfx/hif_rx.c:78: int count = body->num_tx_confs; > which is troubling - the first line (in rx_helper()) expects to find > a little-endian value in that field, while the last (in hif_multi_tx_confirm()) > - a host-endian, with nothing in sight that might account for conversion > from one to another. > > Let's look at the call chains: hif_multi_tx_confirm() is called only as > hif_handlers[...]->handler(), which happens in in wfx_handle_rx(). > The call is > hif_handlers[i].handler(wdev, hif, hif->body); > and hif has come from > struct hif_msg *hif = (struct hif_msg *) skb->data; > wfx_handle_rx() is called by the same rx_helper()... skb is created by > rx_helper() and apparently filled by the call > if (wfx_data_read(wdev, skb->data, alloc_len)) > goto err; > right next to the allocation... and prior to the > release_count = le32_to_cpu(((struct hif_cnf_multi_transmit *)hif->body)->num_tx_confs); > where we expect little-endian, with nothing to modify the skb contents > between that and the call of wfx_handle_rx(). hif in rx_helper() points > to the same place - skb->data. OK, we almost certainly have a bug here. > > That thing allocates a packet and fills it with incoming data. Then > it parses the damn thing, apparently treating the same field of the > incoming as little-endian in one place and host-endian in another. > In principle it's possible that the rest of the packet determines > which one it is, but by the look of that code both places are > hit if and only if hif->id is equal to HIF_CNF_ID_MULTI_TRANSMIT. > It *can't* be correct on big-endian. Not even theoretically. > > And since it's over-the-wire data, I would expect it to be fixed-endian. > That needs to be confirmed with the driver's authors and/or direct > experiment on big-endian host, but I strongly suspect that the right > fix is to have > int count = le32_to_cpu(body->num_tx_confs); > in hif_multi_tx_confirm() (and num_tx_confs annotated as __le32). Indeed, num_tx_confs is always a le32 value. > HOWEVER, that opens another nasty can of worms. We have > struct hif_cnf_tx *buf_loc = (struct hif_cnf_tx *) &body->tx_conf_payload; > ... > for (i = 0; i < count; ++i) { > wfx_tx_confirm_cb(wvif, buf_loc); > buf_loc++; > } > with count derived from the packet and body pointing into the packet. And no > visible checks that would make sure the loop won't run out of the data we'd > actually got. > > The check in rx_helper() verifies that hif->len matches the amount we'd > received; the check for ->num_tx_confs in there doesn't look like what > we'd needed (that would be offset of body.tx_conf_payload in packet + > num_tx_confs * sizeof(struct hif_cnf_multi_transmit) compared to > actual size). > > So it smells like a remote buffer overrun, little-endian or not. > And at that point I would start looking for driver original authors with > some rather pointed questions about the validation of data and lack > thereof. There are not so much checks done on data retrieved from the hardware. I think we can find other similar issues in the driver. In this particular case, indeed, a little check on length of received data could be a good idea. > BTW, if incoming packets are fixed-endian, I would expect more bugs on > big-endian hosts - wfx_tx_confirm_cb() does things like > tx_info->status.tx_time = > arg->media_delay - arg->tx_queue_delay; > with media_delay and tx_queue_delay both being 32bit fields in the > incoming packet. So another question to the authors (or documentation, > or direct experiments) is what endianness do various fields in the incoming > data have. We can try and guess, but... Fortunately, answer is simple enough: everything from hardware is little endian :). Jules, do you want to take care of fixing theses issues (except the one about wfx_update_filtering())? -- Jérôme Pouiller _______________________________________________ devel mailing list devel@xxxxxxxxxxxxxxxxxxxxxx http://driverdev.linuxdriverproject.org/mailman/listinfo/driverdev-devel