EtherCAT® EC-Master stack on Raspberry Pi 4
We are proud to announce that the acontis EtherCAT Master Software EC-Master now supports the BROADCOM® System-on-Chip (SoC) BCM2711 as used on the Raspberry Pi Compute Module 4 (CM4). In addition, a new optimized Ethernet controller driver (acontis Real-time Driver) has been developed for the on-board Ethernet included in the BCM2711. This new Real-time Driver (BcmGenet) supports Linux running on ARM 32-bit and ARM 64-bit architectures, with further operating system (OS) support coming soon.
Due to the smart architecture of the EC-Master stack and our in-house hardware driver knowledge it is not that difficult to implement support for an Ethernet Media Access Controller (MAC). To be independent from the Linux drivers and to gain the best possible performance, throughput and lowest jitter, we decided to implement it as an acontis Real-time Driver.
Remind: This kind of driver has exclusive access to the Ethernet hardware. Non-exclusive access is possible with the Linux Network Driver. So it is important to understand the advantages and disadvantages of the two approaches.
If you are interested in testing our new Real-time Driver for the BROADCOM® Ethernet controller on your own, please contact us for an evaluation package. With the package you are able to reproduce the measurements we did on your own system. The steps to do so are described below.
Setting it up
Evaluation can be done with the Raspberry Pi CM4 or similar system. We recommend the CM4 as it will work out-of-the-box. Other Systems on a Module (SoM) might need further adaptions, for example, in the Linux device tree.
The first step to set up a working Real-time Driver is to customize the Linux device tree. The goal of the customization is to inform the BCM2711 processor that it should use the acontis atemsys kernel module instead of the standard BROADCOM® Genet driver to communicate with the Ethernet MAC. Using the atemsys module isolates the BCM2711 Ethernet MAC from the Linux kernel drivers. As a result, the EtherCAT communication is isolated from the operating system Ethernet communication, for example, ARP or TCP/IP, which is still possible on other Ethernet ports.
After the customization of the device tree the atemsys module can be downloaded, compiled and installed.
Now, all you need to do, to check if the EC-Master is working on your CM4 with the BcmGenet Real-time Driver, is to connect your EtherCAT devices to the CM4 Ethernet port (BROADCOM® BCM54210 Gigabit Ethernet Transceiver) and start the pre-compiled binary example application included with the evaluation package (EcMasterDemo) with the following command line parameters:
./EcMasterDemo -bcmgenet 1 1
Snippet 1 EC-Master Demo call with BcmGenet Real-time Driver in polling mode
The Broadcom Real-time driver takes 2 arguments, the first to specify the Ethernet controller instance to use, and the second to specify the mode of operation, either polling or interrupt. In our case, we specify the first instance (1), and polling mode (1). The command line parameters and associated arguments can be checked with the -help argument when running the example demo program (as shown below), and also in our online documentation.
-bcmgenet Link layer = Broadcom BcmGenet
Instance Device instance (1=first), e.g. 1
Mode Interrupt (0) or Polling (1) mode
The acontis EC-Master software has a built-in performance measurement capability that is also included and can be used with the EcMasterDemo example application. The performance measurement calculation can be called with additional command line parameters as shown below.
./EcMasterDemo -bcmgenet 1 1 -v 3 -perf
Snippet 3 EC-Master Demo call with BcmGenet in polling mode and performance measurement enabled
Depending on the specific measurement to perform you may also choose to start the application with an EtherCAT Network Information (ENI) file.
./EcMasterDemo -bcmgenet 1 1 -f eni.xml -v 3 -perf
Snippet 4 EC-Master Demo call using an ENI file, BcmGenet in polling mode and performance measurement enabled
If the performance measurement is enabled the example application will measure the execution times of the job functions that are called within the cyclic part of the application, as well as the total computing time consumed by the cyclic task itself. The example application uses the included APIs
ecatPerfMeasEnd() for high-precision measurement time calculations.
The resulting measurement values are recorded every few seconds, printed to the console and to the log file, in the following format:
PerfMsmt 'Cycle Time ' (min/avg/max) [usec]: 989.4/1000.0/1011.4
PerfMsmt 'Task Duration (JOB_Total + App)' (min/avg/max) [usec]: 28.5/ 29.6/ 45.6
PerfMsmt 'JOB_Total ' (min/avg/max) [usec]: 24.6/ 25.8/ 44.8
PerfMsmt 'JOB_ProcessAllRxFrames Duration' (min/avg/max) [usec]: 10.0/ 10.6/ 20.1
PerfMsmt 'JOB_SendAllCycFrames Duration ' (min/avg/max) [usec]: 8.2/ 8.8/ 13.5
PerfMsmt 'JOB_MasterTimer Duration ' (min/avg/max) [usec]: 4.6/ 5.3/ 10.0
PerfMsmt 'JOB_SendAcycFrames Duration ' (min/avg/max) [usec]: 0.9/ 1.0/ 6.7
PerfMsmt 'myAppWorkPd ' (min/avg/max) [usec]: 0.8/ 0.9/ 1.3
Listing 1 EC-Master performance measurement results (BcmGenet OSOpti)
The user can see the durations of the important tasks executed by the EC-Master and even the duration of a custom application. Further important information that can be gained from these measurements is the ratio of each task compared to the overall cycle time. If we take, for example, the processing of all received frames (
JOB_ProcessAllRxFrames), this represents 1,06% (10,6 microseconds) of the used cycle time, while the sending of the cyclic frames takes up to 0,88% (8,8 microseconds) of the 1ms cycle in average.
We collected these values in different scenarios and collected them into a table for comparison as you can see in the following sections below. But first we want to show some additional measurements we did and the setup we used for that.
We wanted to corelate the above mentioned EC-Master internal time measurements to and even more reliable external time measurement of the frames on the wire. This does not mean that the measurements within the EC-Master software are inaccurate, but they measure the timing on the processor and not the time that is relevant to the devices on the network, and that’s the timing on the wire. We initially thought to measure with an oscilloscope, but found it is easier to measure and also to reproduce if we use a standard probe. Therefore, we used a Beckhoff® ET2000 to measure the frame timing between the Master and the devices on the network.
The ET2000 time stamps the Ethernet frames in hardware and transfers the frames along with the timestamp to the measurement PC. on the measurement PC we simply ran Wireshark to filter just the cyclic frames and to get the hardware timestamp deltas (hint: "Enable dissector" in Wireshark).
Figure 3 Wireshark configuration
All those numbers were collected in Microsoft® Excel to generate the following tables and figures.
The EC-Master is running on Ubuntu 20.04 ARM-64bit installed on the CM4. So the EcMasterDemo binary is compiled for this system and controls 7 EtherCAT slave devices connected to it in a 1ms cycle.
Wireshark must be started before the EcMasterDemo to ensure the startup of the network is captured and contributes to the statistics. In our measurements the EcMasterDemo run for 1 minute (
-t <time> command line parameter).
We concentrated on four different scenarios, prioritizing the comparison of the new BcmGenet Real-time Driver to the existent BcmSockRaw generic socket driver, both with and without some OS optimizations.
Hint: BcmSockRaw is defined in this article as the raw socket (SockRaw) driver using the standard Linux networking driver for the BCM2711.
Figure 4 The four measurement scenarios
|BcmGenet NoOSOpti:||EC-Master Demo using the BcmGenet Real-time driver and non-volatile-memory (NVM) (32GB SD card)|
|BcmGenet OSOpti:||EC-Master Demo using the BcmGenet Real-time driver, a RAMdisk and an isolated CPU core|
|BcmSockRaw NoOSOpti:||EC-Master Demo using the generic Linux BcmSockRaw driver and NVM|
|BcmSockRaw OSOpti:||EC-Master Demo using the generic Linux BcmSockRaw driver, a RAMdisk and an isolated CPU core|
On the CPU
The following chart shows the measurements from the four above mentioned scenarios as measured on the CPU using the internal EC-Master performance measurement API:
Table 1 EC-Master internal timing measurement
As you can see, there is some more green in the left 2 columns of the min, avg, and max sections. This is expected due to the use of the Real-time Driver in these 2 scenarios. The BcmGenet Real-time driver offers better results in almost all cycle time and job duration indicators. There is one exception which is the processing of all received frames which seems to take a bit longer when using the Real-time Driver. This is an interesting result and we will be looking into this further. Sending of all cyclic frames is more than 4 times faster with the BcmGenet driver. In summary, the BcmGenet Real-time driver cuts total task duration in half compared to the BcmSockRaw generic socket driver.
Furthermore, the BcmGenet Real-time driver offers less jitter than BcmSockRaw generic socket driver. In the best cases for each, the jitter was around ± 1,1% with the BcmGenet Real-time driver and around ±3,2% with the BcmSockRaw generic socket driver.
However, perhaps the most important result to highlight – and this can’t be seen in the table – EC-Master with the BcmGenet Real-time driver reports no frame loss, whereas there was noticeable frame loss with the BcmSockRaw generic socket driver.
On the wire
If we look at the physical timing on the wire we get a similar picture. In fact, the time from the software running in main memory to the physical wire is confirmed to be very short. The min, avg, and max values of the EC-Master internal time measurement can be confirmed and correspond to the timing on the wire. The EC-Master software timing values are trust worthy!
In each of the four scenarios, the first 30.000 cyclic frames of the Wireshark recordings are taken for evaluation. With a 1 millisecond cycle time, or 1kHz frequency, the first 30.000 cyclic frames correspond to the first 30 seconds of each test measurement. The histogram of these first 30 seconds is compared in each test scenario with 200 microsecond wide bin. The two outermost bins represent the boundaries of 975ms and 1025ms, so these bins collect all measurements that exceed these boundaries.
As with the CPU timing previously the BcmGenet Real-time driver shows the best (lowest) jitter behavior. Whether with or without the applied OS optimizations we observed nice narrow peaks in the middle and almost no frames in the extreme areas. With the OS optimizations the values are even better.
The BcmSockRaw generic socket driver shows some deterministic jitter around ±2,5Hz, ±16,2Hz and ±18,7Hz of the main cycle frequency if the OS optimization is applied. These frequencies were observed in repeated tests. The high middle peak around the cycle time is totally missing in this scenario. We believe that kernel thread synchronization leads to this behavior. Without the OS optimizations there is a clear peak around the cycle time and some smaller peaks around ±2,5Hz the main frequency but not in the outer areas.
Besides the mentioned measurements we also did some more measurements e.g. with the well-known and well proven Real-time drivers for the Intel Gigabit Ethernet Adapter family on the same platform. What we can say already the BcmGenet Real-time driver is definitely in the same league.
With the given results it is possible to state, that the new BcmGenet Real-time driver is more deterministic than the BcmSockRaw generic socket driver in a variety of different system configurations. Depending on the OS configuration, EtherCAT-frame loss caused by communication on other parallel Ethernet ports is eliminated purely by the design of the Real-time drivers.
Doing all these measurements also showed that without the correct OS and OS configuration you are not able to get good results. If the application's send request is not executed on time with low jitter then the driver is not able to correct this. With the wrong configuration a loaded system differs from an unloaded one in levels. If you are interested in more measurement results or how you can tweak your system to get better performance, please don’t hesitate to contact us.
We hope this testing and analysis helps in making the decision for what to use in which scenario. And, if necessary, the given description and examples are enough for you to reproduce it on your own hardware, architecture and Linux distribution with your actual desired network configuration.
If you are interested in histograms like the shown above with the performance measurement values of the EC-Master stack check out our new Performance analysis feature of the EC-Engineer V3.7. With this new feature you are able to view histograms in a linear or logarithmic manner on all measured values like
JOB_SendAllCycFrames Duration and the like within the EC-Engineer V3.7.