Date created: Tuesday, May 7, 2013 11:46:29 AM. Last modified: Friday, January 19, 2024 4:34:56 PM

Linux Network Tuning

PHY/NIC/Ethtool

# Check if flow control is enabled
sudo ethtool -a eth3

# Disable flow control
sudo ethtool -A eth3 autoneg off rx off tx off 

# Check the RX and TX queue sizes
sudo ethtool -g eth3

# Set the RX queue size to 4096 bytes
sudo ethtool -G eth3 rx 4096

# Check for hardware offload settings per NIC
sudo ethtool -k eth3

# Disable TX and RX checksumms
sudo ethtool -K eth3 rx off
sudo ethtool -K eth3 tx off

# Show the RX or TX queue stats per NIC
watch -n 1 'sudo ethtool -S eth3 | grep -E " rx_"'

# On a VM the Tx Kick counter might be high, and this might be OK because the virtio vring is shared between VM and host, the VM side produces requests to vring and kicks the virtqueue, while the host side produces responses to vring and interrupts from the VM side.
$ sudo ethtool -S eth0
NIC statistics:
rx_queue_0_packets: 4282561
rx_queue_0_bytes: 2987856497
rx_queue_0_drops: 0
rx_queue_0_xdp_packets: 0
rx_queue_0_xdp_tx: 0
rx_queue_0_xdp_redirects: 0
rx_queue_0_xdp_drops: 0
rx_queue_0_kicks: 84
tx_queue_0_packets: 13479797
tx_queue_0_bytes: 1816396339
tx_queue_0_xdp_tx: 0
tx_queue_0_xdp_tx_drops: 0
tx_queue_0_kicks: 13426617

 

$ watch -n 1 "column -t /proc/net/dev | grep -E 'Inter|face|eth0|eth1'"
Every 1.0s: column -t /proc/net/dev | grep -E 'Inter|face|eth0|eth1'

Inter-|           Receive     |         Transmit
face              |bytes      packets   errs      drop  fifo  frame  compressed  multicast|bytes       packets     errs  drop  fifo  colls  carrier  compressed
eth0:             2990014135  4288231   0         0     0     0      0           0         1818541402  13494586    0     0     0      0     0        0
eth1:             4106015341  16686322  0         0     0     0      0           0         1420117908  7433373     0     0     0      0     0        0

 

$ ip -s link  show dev eth0
2: eth0: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc fq_codel state UP mode DEFAULT group default qlen 1000
link/ether 11:11:11:22:22:22 brd ff:ff:ff:ff:ff:ff
RX: bytes packets errors dropped missed mcast
2993449026 4297303 0 0 0 0
TX: bytes packets errors dropped carrier collsns
1822537655 13525417 0 0 0 0
altname enp0s18
altname ens18

 

IP/TCP/UDP

$ cat /proc/sys/net/core/wmem_default 
212992
$ cat /proc/sys/net/core/wmem_max
212992
$ cat /proc/sys/net/core/rmem_default
212992
$ cat /proc/sys/net/core/rmem_max
212992

watch -n 1 "sudo ss -nupmeO | tee -a ~/ss.log" Every 1.0s: sudo ss -unpmeO Recv-Q Send-Q Local Address:Port Peer Address:PortProcess 0 0 10.10.101.150:43041 8.8.4.4:53 users:(("snmpget",pid=3941568,fd=3)) uid:2113 ino:23550662 sk:1197 cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 10.10.101.150:59706 8.8.4.4:53 users:(("snmpget",pid=3941557,fd=3)) uid:2113 ino:23550641 sk:1194 cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 10.10.101.150:33668 8.8.4.4:53 users:(("snmpget",pid=3941576,fd=3)) uid:2113 ino:23550681 sk:1198 cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 10.10.101.150:55150 8.8.4.4:53 users:(("snmpget",pid=3941583,fd=3)) uid:2113 ino:23552059 sk:1199 cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 10.10.101.150:39359 8.8.4.4:53 users:(("snmpget",pid=3941589,fd=3)) uid:2113 ino:23552066 sk:119a cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 10.10.101.150:56464 8.8.4.4:53 users:(("snmpbulkwalk",pid=3941588,fd=3)) uid:2113 ino:23553139 sk:119b cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 10.10.101.150:58603 8.8.4.4:53 users:(("snmpget",pid=3941591,fd=3)) uid:2113 ino:23550706 sk:119c cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0) 0 0 [2001:2001:0:6::1b]:59740 [2001:2001:2001::1111]:53 users:(("snmpget",pid=3941596,fd=3)) uid:2113 ino:23551224 sk:119d cgroup:/system.slice/observium-poller.service <-> skmem:(r0,rb212992,t0,tb212992,f0,w0,o0,bl0,d0)

 

Sysctl / Tunable Kernel Paramters for IP Performance:

This is a great reference: https://sysctl-explorer.net/net/

sysctl -w net.core.rmem_default=524287

/proc/sys/net/core/rmem_default - default receive window
   (default=124928), suggested change to 524287

/proc/sys/net/core/wmem_default - default send window
   (default=124928), suggested change to 524287

/proc/sys/net/core/rmem_max - maximum receive window
   (default=131071), suggested change to 524287

/proc/sys/net/core/wmem_max - maximum send window
   (default=131071), suggested change to 524287

/proc/sys/net/core/optmem_max - maximum option memory buffers
   (default=20480), suggested change to 524287

/proc/sys/net/core/netdev_max_backlog - number of unprocessed input packets before kernel starts dropping them
   (default=1000), suggested change to 300000

/proc/sys/net/ipv4/tcp_rmem - memory reserved for TCP rcv buffers (min default max)
   (defaults 4096    87380   4194304), suggested change to 10000000 10000000 10000000

/proc/sys/net/ipv4/tcp_wmem - memory reserved for TCP snd buffers (min default max)
   (defaults 4096    16384   4194304), suggested change to 10000000 10000000 10000000

/proc/sys/net/ipv4/tcp_mem - memory reserved for TCP buffers (min default max)
   (defaults 193152  257536  386304), suggested change to 10000000 10000000 10000000

ip_forward - (Boolean; default: disabled; since Linux 1.2) Enable IP forwarding with a boolean flag. IP forwarding can be also set on a per-interface basis.

echo 1 > /proc/sys/net/ipv4/ip_forward

 

ip_local_port_range - (Two integers, low and high bound, default 1024 to 4999 or 32768 61000; since Linux 2.2) - The ephemeral port range. Allocation starts with the first number and ends with the second number. Note that these should not conflict with the ports used by masquerading (although the case is handled). Also arbitrary choices may cause problems with some firewall packet filters that make assumptions about the local ports in use. First number should be at least greater than 1024, or better, greater than 4096, to avoid clashes with well known ports and to minimize firewall problems.

echo "10000 65000" > /proc/sys/net/ipv4/ip_local_port_range

 

ip_no_pmtu_disc - (Boolean; default: disabled; since Linux 2.2) If enabled, don't do Path MTU Discovery for TCP sockets by default. Path MTU discovery may fail if misconfigured firewalls (that drop all ICMP packets) or misconfigured interfaces (e.g., a point-to-point link where the both ends don't agree on the MTU) are on the path. It is better to fix the broken routers on the path than to turn off Path MTU Discovery globally, because not doing it incurs a high cost to the network.

echo 0 > /proc/sys/net/ipv4/ip_no_pmtu_disc

 

ip_nonlocal_bind - (Boolean, default disabled) If set, allows processes to bind(2) to nonlocal IP addresses, which can be quite useful, but may break some applications. 

echo 1 > /proc/sys/net/ipv4/ip_nonlocal_bind

 

tcp_fin_timeout (integer; default: 60; since Linux 2.2) This specifies how many seconds to wait for a final FIN packet before the socket is forcibly closed. This is strictly a violation of the TCP specification, but required to prevent denial-of-service attacks. In Linux 2.2, the default value was 180.

echo 30 > /proc/sys/net/ipv4/tcp_fin_timeout

 

tcp_keepalive_time - (Integer, seconds, default 7200; since Linux 2.2) The interval between the last data packet sent (simple ACKs are not considered data) and the first keepalive probe; after the connection is marked to need keepalive, this counter is not used any further.

echo 600 > /proc/sys/net/ipv4/tcp_keepalive_time

 

tcp_keepalive_intvl - (Integer, seconds, default 75; since Linux 2.4) The interval between subsequential keepalive probes, regardless of what the connection has exchanged in the meantime.

echo > 60 /proc/sys/net/ipv4/tcp_keepalive_intvl

 

tcp_keepalive_probes - (Integer, number of proves, default 9; since Linux 2.2) The number of unacknowledged probes to send before considering the connection dead and notifying the application layer.

echo > 15 /proc/sys/net/ipv4/tcp_keepalive_probes

 

tcp_retries 2 - (integer; default: 15; since Linux 2.2) The maximum number of times a TCP packet is retransmitted in established state before giving up. The default value is 15, which corresponds to a duration of approximately between 13 to 30 minutes, depending on the retransmission timeout. The RFC 1122 specified minimum limit of 100 seconds is typically deemed too short.

echo 5 > /proc/sys/net/ipv4/tcp_retries2

 

tcp_tw_recycle - (Boolean; default: disabled; since Linux 2.4) - Enable fast recycling of TIME_WAIT sockets.  Enabling this option is not recommended since this causes problems when working with NAT (Net‐work Address Translation).

tcp_tw_reuse - (Boolean; default: disabled; since Linux 2.4.19/2.6) Allow  to  reuse  TIME_WAIT  sockets  for  new connections  when  it  is  safe  from protocol viewpoint. It should not be changed without advice/request of technical experts.

echo 1 > /proc/sys/net/ipv4/tcp_tw_reuse

 

References:
man 7 ip
man 7 tcp
http://www.faqs.org/docs/securing/chap6sec70.html
http://man7.org/linux/man-pages/man7/ip.7.html
http://tldp.org/HOWTO/TCP-Keepalive-HOWTO/usingkeepalive.html
http://lartc.org/howto/lartc.kernel.obscure.html 

 

 

ICMP Limiting:

 

icmp_destunreach_rate - (Integer, 1/100ths of a second; Linux 2.2 to 2.4.9) Maximum rate to send ICMP Destination Unreachable packets. This limits the rate at which packets are sent to any individual route or destination. The limit does not affect sending of ICMP_FRAG_NEEDED packets needed for path MTU discovery.

 

icmp_echoreply_rate - (Integer, 1/100ths of a second; Linux 2.2 to 2.4.9) Maximum rate for sending ICMP_ECHOREPLY packets in response to ICMP_ECHOREQUEST packets.

 

icmp_paramprob_rate - (Integer, 1/100ths of a second; Linux 2.2 to 2.4.9) Maximum rate for sending ICMP_PARAMETERPROB packets. These packets are sent when a packet arrives with an invalid IP header.

 

icmp_timeexceed_rate - (Integer, 1/100ths of a second; Linux 2.2 to 2.4.9) Maximum rate for sending ICMP_TIME_EXCEEDED packets. These packets are sent to prevent loops when a packet has crossed too many hops.

 

icmp_ratelimit - (integer; default: 1000; since Linux 2.4.10) Limit the maximum rates for sending ICMP packets whose type matches icmp_ratemask (see below) to specific targets. 0 to disable any limiting, otherwise the minimum space between responses in milliseconds.


icmp_ratemask
 - (integer; default: see below; since Linux 2.4.10) Mask made of ICMP types for which rates are being limited.

              Significant bits: IHGFEDCBA9876543210              Default mask:     0000001100000011000 (0x1818)
              Bit definitions (see the kernel source file include/linux/icmp.h):
              0 Echo Reply
              3 Destination Unreachable *
              4 Source Quench *
              5 Redirect
              8 Echo Request
              B Time Exceeded *
              C Parameter Problem *
              D Timestamp Request
              E Timestamp Reply
              F Info Request
              G Info Reply
              H Address Mask Request
              I Address Mask Reply

              The bits marked with an asterisk are rate limited by default (see the default mask above).

 

References:
man 7 icmp