Date created: 03/07/16 08:45:33. Last modified: 04/17/16 17:04:12

Core, Processes, Threads, Interrupts

References:
https://www.centos.org/docs/5/html/5.1/Deployment_Guide/s2-proc-loadavg.html
https://doc.opensuse.org/documentation/html/openSUSE_121/opensuse-tuning/cha.tuning.taskscheduler.html
https://access.redhat.com/documentation/en-US/Red_Hat_Enterprise_Linux/6/html/Performance_Tuning_Guide/s-cpu-irq.html

Reference on Linux cores, processes, threads, interrupts

 

Single Core Systems and Parallel Processing

For single core systems the Linux kernel can run multiple processes in parallel. Assuming there is more than one process running on a single core system (or in the case of a multi-core system, there are more processes running than there are CPU cores), each process gets (for example) 1/100th of a second of CPU time, after 1/100th of a second has passed the process is suspended or paused with an interrupt and the CPU spends the next 1/100th of a second processing the second active process, and so on looping round all running processes, providing parallel multi-processing.

Switching between processes is managed by the "process scheduler" or "task scheduler". This forced suspension is called pre-emption. The task scheduler decides what process in the queue will run next.

The amount of time a running process receives on the processor is called a timeslice. The task scheduler decides how long the timeslice is and which processes are higher priority than others. Also by having the task scheduling management within the kernel no one process should be able to dominate resources under normal operating circumstanced.

In Linux 2.6 the O(1) scheduler became the default scheduler up to Kernel version 2.6.22. Using a fixed amount of time per process no matter how many runnable processes are active on the system. Too long timeslices cause the system to be less interactive and responsive, while too short ones make the processor waste a lot of time on the overhead of switching the processes too frequently. The default timeslice is usually rather low, for example 20ms. The scheduler determines the timeslice based on priority of a process, which allows the processes with higher priority to run more often and for a longer time. A process does not have to utilize all its timeslice at once.

Since Linux kernel version 2.6.23 Completely Fair Scheduler (CFS) became the default Linux kernel scheduler. The scheduler environment was divided into several parts. CFS tries to guarantee a fair approach to each runnable task. When a task enters into the run queue the scheduler records the current time. While the process waits for processor time its “wait” value gets incremented by an amount derived from the total number of tasks currently in the run queue and the process priority. As soon as the processor runs the task its “wait” value gets decremented. If the value drops below a certain level, the task is pre-empted by the scheduler and other tasks get closer to the processor. By this algorithm, CFS tries to reach the ideal state where the “wait” value is always zero.

#The first three columns measure CPU and IO utilization of the last one, five, and ten minute periods. The fourth column shows the number of currently running processes and the total number of processes. The last column displays the last process ID used. 

# cat /proc/loadavg
0.00 0.03 0.05 1/111 2635


# chrt can be used to set or change the real-time scheduling attributes of a process

# sleep 1000
# pgrep sleep
3764

# chrt -p -f 3764
pid 3764's current scheduling policy: SCHED_OTHER
pid 3764's current scheduling priority: 0
# ps -lp 3764
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S     0  3894  2526  0  58   - - 26973 hrtime pts/0    00:00:00 sleep

# chrt -p -f 1 3764
# chrt -p 3764
pid 3764's current scheduling policy: SCHED_FIFO
pid 3764's current scheduling priority: 1
# ps -lp 3894
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
0 S     0  3894  2526  0  80   0 - 26973 hrtime pts/0    00:00:00 sleep



# nice and renice can be used to set and adjust the priority of a process

# nice -n -10 sleep 1000
# pgrep sleep
3978
[[email protected] ~]# ps -lp 3978
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S     0  3978  2526  0  70 -10 - 26973 hrtime pts/0    00:00:00 sleep
[[email protected] ~]# renice -n 10 3978
3978 (process ID) old priority -10, new priority 10
[[email protected] ~]# ps -lp 3978
F S   UID   PID  PPID  C PRI  NI ADDR SZ WCHAN  TTY          TIME CMD
4 S     0  3978  2526  0  90  10 - 26973 hrtime pts/0    00:00:00 sleep

 

Multi-Threaded  Applications &  Multi-Core Systems

Processes can start multiple threads. The Linux kernel views threads of a multi-threaded application as processes with shared resources so for scheduling purposes it will schedule each thread (even multiple threads of the same process) as if they were individual processes (single threaded processes). This means each thread of a multi-threaded process can be spread over different logical cores.

Each physical machine may have multiple physical processors, each processor may have multiple logical cores to and the physical processor might provide SMT (simultaneous multithreading) features like Intel's HyperThreading doubling the number of logical cores. In the case of a processor with single a logical core, running two processes means they are context switches periodically by the Linux task scheduler. With HT enabled the physical processor presents the single logical core as two logical cores. Now Linux kernel can assign one logical core to each of the two processes and not have to "offload" the task scheduling to the physical processor.

Some multi-core processors also support out-of-order execution; whilst logical core N is busy/waiting that logical core can process some other code rather than wasting CPU cycles for a memory read or write to complete or return.

# Checking CPU count on a VM guest, `lscpu` also shows that the host in NUMA-enabled

# grep "processor" /proc/cpuinfo
processor       : 0
processor       : 1
# lscpu
Architecture:          x86_64
CPU op-mode(s):        32-bit, 64-bit
Byte Order:            Little Endian
CPU(s):                2
On-line CPU(s) list:   0,1
Thread(s) per core:    1
Core(s) per socket:    2
Socket(s):             1
NUMA node(s):          1
Vendor ID:             GenuineIntel
CPU family:            6
Model:                 69
Model name:            Intel(R) Core(TM) i5-4200U CPU @ 1.60GHz
Stepping:              1
CPU MHz:               2294.651
BogoMIPS:              4589.30
L1d cache:             32K
L1i cache:             32K
L2 cache:              256K
L3 cache:              3072K
NUMA node0 CPU(s):     0,1

CPU Affinity

$taskset -c 1 ./application

 

Interrupts

An interrupt request (IRQ) is a request for service, sent at the hardware level. Interrupts can be sent by either a dedicated hardware line (out of band), or across a hardware bus as an information packet (in band, as a Message Signalled Interrupt, or MSI, MSI-X is a newer revision which allows more granular tying of interrupts to specific processors).

Receipt of an IRQ prompts a switch to the interrupt context (a.k.a "context switches"). Kernel interrupt dispatch code retrieves the IRQ number and its associated list of registered Interrupt Service Routines (ISRs), and calls each ISR in turn. The ISR acknowledges the interrupt and ignores redundant interrupts from the same IRQ, then queues a deferred handler to finish processing the interrupt and stop the ISR from ignoring future interrupts.

The /proc/interrupts file lists the number of interrupts per CPU per I/O device. It displays the IRQ number, the number of that interrupt handled by each CPU core, the interrupt type, and a comma-delimited list of drivers that are registered to receive that interrupt.

IRQs have an associated "affinity" property, smp_affinity, which defines the CPU cores that are allowed to execute the ISR for that IRQ. This property can be used to improve application performance by assigning both interrupt affinity and the application's thread affinity to one or more specific CPU cores. This allows cache line sharing between the specified interrupt and application threads. The interrupt affinity value for a particular IRQ number is stored in the associated "/proc/irq/IRQ_NUMBER/smp_affinity" file. The value stored in this file is a hexadecimal bit-mask representing all CPU cores in the system.

Example to set the interrupt affinity for the Ethernet driver on a server with four CPU cores.

First determine the IRQ number associated with the Ethernet driver: # grep eth0 /proc/interrupts 32: 0 140 45 850264 PCI-MSI-edge eth0 Use the IRQ number to locate the appropriate smp_affinity file: # cat /proc/irq/32/smp_affinity f The default value for smp_affinity is f, meaning that the IRQ can be serviced on any of the CPUs in the system. Setting this value to 1, as follows, means that only CPU 0 can service this interrupt: # echo 1 >/proc/irq/32/smp_affinity # cat /proc/irq/32/smp_affinity 1 Commas can be used to delimit smp_affinity values for discrete 32-bit groups. This is required on systems with more than 32 cores. For example, the following example shows that IRQ 40 is serviced on all cores of a 64-core system: # cat /proc/irq/40/smp_affinity ffffffff,ffffffff To service IRQ 40 on only the upper 32-cores of a 64-core system, you would do the following: # echo 0xffffffff,00000000 > /proc/irq/40/smp_affinity # cat /proc/irq/40/smp_affinity ffffffff,00000000 Note On systems that support interrupt steering, modifying the smp_affinity of an IRQ sets up the hardware so that the decision to service an interrupt with a particular CPU is made at the hardware level, with no intervention from the kernel.