Debugging nRF BLE Connection Issues With Zephyr RTOS

Published

Overview

MAB Labs Embedded Solutions recently helped a client with a very strange BLE problem. Luckily, our background in wireless communication systems not only helped solve the problem but provide some insight as to why the solution worked. In this blog post, we’ll share the problem, the solution, and the explanation.

The client has a product that is based on the nRF52840 (https://www.nordicsemi.com/products/nrf52840)  running an application that is based on the Zephyr Project RTOS (https://zephyrproject.org/). The problem that they encountered was that the Bluetooth connection from their device to a PC or smartphone was inconsistent. Essentially, their connection would frequently drop. One clue that we had regarding their problem was that this problem didn’t exist on the nRF52840 development kit; it only was present on their own hardware. When we reviewed the BOM and schematic of their actual product, we noticed that they actually used a Laird BL654 (https://www.lairdconnect.com/wireless-modules/bluetooth-modules/bluetooth-5-modules/bl654-series-bluetooth-module-nfc), which is essentially a nRF52840 packaged with other critical components into a System on Module (or SOM).

Hardware Review

After reviewing the datasheet of the Laird BL654 (https://connectivity-staging.s3.us-east-2.amazonaws.com/2019-03/CS-DS-BL654%20v1_8.pdf) , we found the following important piece of information:

We knew from previous experience that the nRF52840 development kit (https://infocenter.nordicsemi.com/pdf/nRF52840_PS_v1.0.pdf)  inherently uses a much “better” oscillator (“better” is defined as an oscillator with a lower PPM):

Zephyr BLE nRF

Software Review

We also knew that the default clock accuracy setting in Zephyr was set to +/- 50 PPM if the option to use an RC oscillator is not used, since that is the accuracy of clocks commonly used in development kits (https://docs.zephyrproject.org/3.0.0/reference/kconfig/choice_232.html#choice-232):

When we changed the Zephyr configuration option to include CONFIG_CLOCK_CONTROL_NRF_K32SRC_RC (which in turn sets the default clock accuracy to +/- 500 PPM), we saw much better behavior on our client’s product. There were never any connection drops!

Details

Let’s dig into the Zephyr source code and see how this parameter is used to understand why requiring an accurate PPM setting would affect connectivity issues. If we search for CLOCK_CONTROL_NRF_K32SRC_500PPM , we find it’s used in the source file zephyr/include/zephyr/drivers/clock_control/nrf_clock_control.h in the following context:

/* Define 32KHz clock accuracy */
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_500PPM
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 0
#endif
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_250PPM
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 1                                                
#endif                                                                                     
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_150PPM                                              
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 2
#endif
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_100PPM                                              
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 3                                                
#endif
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_75PPM
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 4 
#endif
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_50PPM                                               
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 5                                                
#endif                                                                                     
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_30PPM                                               
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 6                                                
#endif
#ifdef CONFIG_CLOCK_CONTROL_NRF_K32SRC_20PPM                                               
#define CLOCK_CONTROL_NRF_K32SRC_ACCURACY 7
#endif

If we search for CLOCK_CONTROL_NRF_K32SRC_ACCURACY , we can see that it’s used in the following functions in the file zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll_clock.c:

uint8_t lll_clock_sca_local_get(void)
{
        return CLOCK_CONTROL_NRF_K32SRC_ACCURACY;
}

uint32_t lll_clock_ppm_local_get(void)
{
        return sca_ppm_lut[CLOCK_CONTROL_NRF_K32SRC_ACCURACY];
}

We’re going to focus on the function lll_clock_ppm_local_get (in a future blog post, we’ll see where the other function is used).

If we search for where lll_clock_ppm_local_get is used, we want to focus its use in the source file zephyr/subsys/bluetooth/controller/ll_sw/ull_sync.c in the following function:

void ull_sync_setup(struct ll_scan_set *scan, struct ll_scan_aux_set *aux,
                                      struct node_rx_hdr *node_rx, struct pdu_adv_sync_info *si)
{
.
.
.
        lll->window_widening_periodic_us =
                ceiling_fraction(((lll_clock_ppm_local_get() +
                                   lll_clock_ppm_get(sca)) *
                                  interval_us), USEC_PER_SEC);
.
.
}

If we search for where window_widening_periodic_us, we can see that it is used in the source file zephyr/subsys/bluetooth/controller/ll_sw/nordic/lll/lll_peripheral.c, in the function lll_periph_prepare. Similarly, we can see that this variable is ultimately used to set the value of hcto in the function prepare_cb, which is ultimately passed to the function radio_tmr_hcto_configure:

static int prepare_cb(struct lll_prepare_param *p)
{
.
.
.
       /* current window widening */
        lll->periph.window_widening_event_us +=
                lll->periph.window_widening_prepare_us;
        lll->periph.window_widening_prepare_us = 0;
        if (lll->periph.window_widening_event_us >
            lll->periph.window_widening_max_us) {
                lll->periph.window_widening_event_us =
                        lll->periph.window_widening_max_us;
        }
.
.
.
        hcto = remainder_us +
               ((EVENT_JITTER_US + EVENT_TICKER_RES_MARGIN_US +
                 lll->periph.window_widening_event_us) << 1) +
               lll->periph.window_size_event_us;
.
.
.
         radio_tmr_hcto_configure(hcto);
.
.
.
}

This function is near the start of the call stack to establish the appropriate parameters for the peripheral. We can see that the radio_tmr_hcto_configure function is defined in the zephyr/subsy/bluetooth/controller/ll_sw/nordic/hal/nrf5/radio/radio.c source file. If we look at other functions in this source file, we can see that it is related to operations in the BLE software stack. Specifically, functions in this source file are used to determine when there is any sort of timeout.

If we take a step back, we can see that the clock accuracy setting is used to set an additional amount of time for the BLE software stack to wait until determining a timeout. If we specify a clock accuracy of +/- 50 PPM but the actual hardware clock is much worse (with an accuracy of +/- 500 PPM), the software stack will timeout well before the expected timeout due to clock jitter. When we update the update the Zephyr configuration to specify a clock accuracy of +/- 500 PPM, we are instructing the software stack to wait an additional amount of time before timing out during any sort of BLE communication, to account for the jitter that the clock will introduce.

Summary

In summary, MAB Labs Embedded Solutions was able to leverage its background in wireless communication systems to help our client with a tricky problem in their BLE based system. With a simple change in the Zephyr Project RTOS’ configuration options, we were able to quickly resolve their issue of excessive dropped connections between their device and standard BLE devices. If you have any needs or issues with BLE or an nRF based product, feel free to reach out to us at mab@mab-labs.com, or by sending us a message here: https://mab-labs.com/#contact.