Etherate FAQs

Etherate FAQs

100% Traffic Loss
Alternate MACs
Exit Codes
How does Etherate measure delay?
What time source is used for delay measurements?
Why is / isn't QinQ working?

100% Traffic Loss

Some people experience packet loss when they first try to use Etherate or major packet loss unless the frame size is smaller / data rate lower. This is usually resolved by disabling flow control on the NIC.

# Check if flow control is enabled
sudo ethtool -a eth3

# Disable flow control
sudo ethtool -A eth3 autoneg off rx off tx off

# If possible "disable" interrupt coalescing too
sudo ethtool -C eth3 rx-usecs 0
sudo ethtool -C eth3 tx-usecs 0
sudo ethtool -c eth3

 

Alternate MACs

By default  the TX Etherate host will use a source MAC address of 00:00:5E:00:00:01 and the RX host will use :02, not their burnt in address.

IANA Unassigned Addresses
Etherate has a CLI option to used any source and destination MAC on each of the TX and RX host. By default the hosts use an IANA unassigned address one wouldn't expect to find on any adapter as they are unassigned by IANA at pressent. The ideas is to make it easy to isolate Etherate frames in packet captures and filters, to remove the need to pipe out MAC tables or logs, and create easy MAC/VLAN ACLs. The -d and -s options can be used to specify any MAC address;

// MAC ADDRESS ASSIGNMENT;
// At the time of writing these are in an unassigned block
// http://www.iana.org/assignments/ethernet-numbers
// RFC5432
// unsigned char sourceMAC[6] = {0x00, 0x00, 0x5E, 0x00, 0x00, 0x01};
// unsigned char destMAC[6] = {0x00, 0x00, 0x5E, 0x00, 0x00, 0x02};

 

Exit Codes

Etherate uses the following exit codes:

#include <stdlib.h>          // EXIT_SUCCESS = 0 Program completed/returned without issue
#include "sysexits.h"        // EX_USAGE = 64 cli error
                             // EX_NOPERM = 77 permission denied
                             // EX_SOFTWARE = 70 internal software error
                             // EX_PROTOCOL = 76 remote error in protocol

 

How does Etherate measure delay?

Long version starts here:

To measure delay between the TX and RX hosts with any sort of accuracy two criteria need to be met. Firstly a method is needed that doesn't use the system clock on both hosts. Secondly the testing application needs to spend a minimal amount of time processing the request on both hosts such that the calculated delay is representative of the network delay only and not the network delay and in addition the delay for application processing.

An example method: TX can send a timestamp which is its current system clock time to RX and RX can send that same value back to TX again. The current system clock time at which TX receives its original timestamp back minus the timestamp value gives a difference which is roughly the delay across the network. There are three main problems with this.

Firstly it doesn't mitigate my slopey programing skills. If the network delay is actually 1ms then it takes 2ms for the data to go from TX to RX and then from RX back to TX again (the network RTT). If it takes the TX application 3ms to get its current time value and send that to the network for transmission towards RX, and RX might take 3ms to receiving and processing it and then sending it to the network for transmission back to TX again, then finally TX takes another 3ms to process the returned original timestamp value; the delay will show as 3x3ms for three lots of application processing + 2ms for two lots of network transmission at 1ms each way = 11ms total; but most of that 11ms was application processing time not network encoding, queuing, scheduling and transmission time.

Secondly suppose that TX sends RX its current clock time and then whilst RX is processing that request the TX host OS processes an NTP update and shifts its system clock time either forwards or backwards. Whatever the original TX timestamp value is that RX sends back to TX it will no longer be relative.

Thirdly it is preferable to measure the delay from TX to RX (unidirectional) only, in case of an asymmetric traffic path. We can run the RX host in TX mode and the TX host in RX mode and then test back in the other direction if we want a bi-directional measurement. It is not safe to assume that the TX and RX paths are symmetrical (even on Ethernet).

Short version starts here:

Etherate gets the current uptime value from the TX host and sends this to the RX host (a value which is independent of NTP updates or system clock changes, and is recorded to an accuracy of microseconds). It then does this again, getting the uptime value again though not sending the same uptime value twice. Some amount of time will have passed since TX got the first uptime value because it packed that value into an Ethernet frame and handed it to the Linux Kernel for transmission across the network towards RX , so when TX gets its uptime value for the second time it should be larger (the TX host uptime has increased). The RX host records the time at which it receives each of the two uptime values from TX.

At this point the RX host has four values;

TX uptime value == $TX_UPTIME_1
The time RX received that value == $RX_UPTIME_1
TX 2nd uptime value == $TX_UPTIME_2
The time RX received that 2nd value == $RX_UPTIME_2

RX can now perform a simple calculation:

$TX_APPLICATION_DELAY = $TX_UPTIME_2 - $TX_UPTIME-1

$TX_UPTIME_2 should be in the future compared to $TX_UPTIME_1 and thus larger, so it’s safe to make the subtraction. The difference between those two values ($TX_APPLICATOIN_DELAY) is roughly the time the TX host spent on application processing. TX copied its currently uptime into a buffer, then passed that buffer to the Kernel to be sent over the network, then copied the uptime again (for the 2nd transmission).

$RX_NETWORK_DELAY = $RX_UPTIME_2 - $RX_UPTIME_1

The difference between the time at which RX received the two uptime values from TX is roughly the network delay with a bit of application delay (its a fairly small amount of code being run on the TX host). So finally RX can calculate:

$RX_NETWORK_DELAY - $TX_APPLICATION_DELAY

That gives a more accurate figure of the network delay, not including the application delay at the TX side.

This whole process is then repeated 10,000 times in an attempt to circumvent any interrupt coalescence that maybe present or driver buffers causing the TX frames to be queued for a longer time than the application takes to process them.

 

What time source is used for delay measurements?

CLOCK_MONOTONIC_RAW is Linux in from linux-2.26.28 onwards. It's used here for accurate delay calculation that is exempt from NTP updates (however I can be subject to hardware clock waving). There is a good StackOverflow post about this here; http://stackoverflow.com/questions/3657289/linux-clock-gettimeclock-monotonic-strange-non-monotonic-behavior

 

Why is / isn't QinQ working?

QinQ is currently facing a few 'issues' within the newer Linux versions as per this link and this link. It should be working in Etherate on recent Kernels by assuming the outter most tag was present however a sniffer/packet capture should be used to verify if one needs to be certain a frame was received with multiple VLAN tags.