System monitoring

Available from firmware version 2025.0

The system monitoring is a library that consists of the components system watchdog and system monitor. The firmware component system monitor has been released with firmware version 2025.0 and extends the system watchdog with diagnostic options. It monitors system critical parameters like RAM and CPU usage, as well the processes of the PLCnext firmware and emits notifications if certain events are triggered, for example if a threshold is exceeded or a process is aborted unplanned. 
The system watchdog subscribes fatal thresholds of the system critical parameters of the system monitor and also adds some warning thresholds for RAM, and CPU load. It further triggers and supervises the shutdown of the firmware and the reset of the controller in case of an emergency reset.

Configuration of the system monitoring

The default configuration of the system watchdog can be found in the configuration file under /etc/plcnext/device/System/Monitoring/Default.wdg.config. The limit values above which a warning is issued or an action is executed are defined here. 

To change the configuration or to use your own configuration, place your own configuration file at /opt/plcnext/config/System/Monitoring/ and adjust the values. You can copy the Default.wdg.config file, save it under a new name (ensuring it ends with .wdg.config), and customize the parameters. The files Default.wdg.config and *.wdg.config will then be merged. If you use the same name for an event in your configuration file, such as Full.Load.Warn, the values of the default configuration file will be overwritten by the customized configuration.

Default XML configuration file:
<?xml version="1.0" encoding="UTF-8"?>
<SystemWatchdogConfigDocument schemaVersion="1.0.0.0">
 <SystemWatchdogConfig>
  <MonitorEventSubscriptions>
    <CpuLoadEventSubscriptions>
        <CpuLoadEventSub name="Full.Load.Warn"     cpu="0"  priolevel="0" upperThresh="95" lowerThresh="70" upperDur="10s" lowerDur="60s" reaction="Warning"  diagSetName="disabled" />
        <CpuLoadEventSub name="Disturb.Load.Warn"  cpu="0"  priolevel="1" upperThresh="95" lowerThresh="70" upperDur="2s"  lowerDur="60s" reaction="Warning"  diagSetName="std_cpu" />
        <CpuLoadEventSub name="Disturb.Load.Swd"   cpu="0"  priolevel="1" upperThresh="95" lowerThresh="70" upperDur="15s" lowerDur="60s" reaction="Watchdog" diagSetName="ext_cpu" />
        <CpuLoadEventSub name="Critical.Load.Warn" cpu="0"  priolevel="2" upperThresh="95" lowerThresh="70" upperDur="1s"  lowerDur="60s" reaction="Warning"  diagSetName="std_cpu" />
        <CpuLoadEventSub name="Critical.Load.Swd"  cpu="0"  priolevel="2" upperThresh="95" lowerThresh="70" upperDur="3s"  lowerDur="60s" reaction="Watchdog" diagSetName="ext_cpu" />
    </CpuLoadEventSubscriptions>
    <SystemRamLoadEventSubscriptions>
        <SystemRamLoadEventSub name="RamLoad.Warn" upperThresh="90" lowerThresh="85" upperDur="1000ms" lowerDur="60s" reaction="Warning"  diagSetName="std_ram"/>
        <SystemRamLoadEventSub name="RamLoad.Swd"  upperThresh="95" lowerThresh="90" upperDur="1000ms" lowerDur="60s" reaction="Watchdog" diagSetName="std_ram"/>
    </SystemRamLoadEventSubscriptions>
  </MonitorEventSubscriptions>
  <DiagnosticSets>
    <DiagnosticSet name="disabled" collectLogs="false" ramLoads="false" cpuLoads="false" lttng="false" />
    <DiagnosticSet name="std_cpu"  collectLogs="false" ramLoads="false" cpuLoads="true"  lttng="false" />
    <DiagnosticSet name="ext_cpu"  collectLogs="true"  ramLoads="false" cpuLoads="true"  lttng="true" />
    <DiagnosticSet name="std_ram"  collectLogs="true"  ramLoads="true"  cpuLoads="false" lttng="false" />
    <DiagnosticSet name="std_prc"  collectLogs="true"  ramLoads="false" cpuLoads="false" lttng="true"  /> <!-- this is a internal used DiagnosticSet for process-monitoring -->
  </DiagnosticSets>
 </SystemWatchdogConfig>
</SystemWatchdogConfigDocument>

Within the <SystemWatchdogConfigDocument> schema, there are the following XML elements to be configured:

  • <MonitorEventSubscriptions> consisting of <CpuLoadEventSubscriptions> and <SystemRamLoadEventSubscriptions>
  • <DiagnosticSets>

<CpuLoadEventSubscriptions>

The CpuLoadEventSubscriptions element defines the monitoring of the CPU load and the corresponding reactions to certain events.

Attribute Description
CpuLoadEventSub name Unique name of the CpuLoadEventSubscription
If a new name is created, a new subscription is created with it.

cpu

The cpu attribute is used to define which CPU is to be analyzed.
If you want to analyze a specific core, select the core by adding a number, for example cpu="1" or cpu="2". With cpu="0" the whole system is analyzed.
priolevel The priority with which the CPU load event must be monitored is specified here.
priolevel="0" Full load event: 100% CPU usage but all tasks are still being processed
priolevel="1" Disturb load event: low priority tasks cannot find a timeslot for execution. The system does not work as expected
priolevel="2" Critical load event: the process is not controllable and very critical
upperThresh This attribute sets the upper threshold for the CPU load (for example upperThresh="95" at 95 %). If the CPU load exceeds this value, the specified reaction (for example a warning) will be triggered after the duration specified in upperDur.
lowerThresh This attribute sets the lower threshold for the CPU load (for example lowerThresh="70" at 70 %). If the CPU load falls below this value, a notification is issued after the duration specified in lowerDur(only for reaction="Warning").
upperDur The CPU load must exceed the upper threshold (upperThresh) for the time defined in upperDur (for example upperDur="10s" for 10 seconds) before the reaction is triggered.
lowerDur The CPU load must remain below the lower threshold (lowerThresh) for the time defined in lowerDur (for example lowerDur="60s" for 60 seconds).
reaction This attribute specifies the action to be taken when the specified threshold is crossed. With reaction="Warning", a warning is issued. With reaction="Watchdog" a system watchdog is triggered. 
diagSetName Here you define which DiagnosticSet should be used for the CPULoadEventSubcsiption. A DiagnosticSet defines which log files are to be collected for the subscription. It is specified in the element <DiagnosticSets>.
If you do not want to collect log files, enter diagSetName="disabled". Only warning notifications will be displayed then. See also section Diagnostic_folder.

<SystemRamLoadEventSubscriptions>

The element SystemRamLoadEventSubscriptions defines the monitoring of RAM utilization and the corresponding reactions to certain events.

Attribute Description
SystemRamLoadEventSub name Unique name of the SystemRamLoadEventSubscriptions
If a new name is created, a new subscription is created with it.
upperThresh This attribute sets the upper threshold for the ram load (for example upperThresh="95" at 95 %). If the CPU load exceeds this value, the specified reaction (for example a warning) will be triggered after the duration specified in upperDur.
lowerThresh This attribute sets the lower threshold for the CPU load (for example lowerThresh="70" at 70 %). If the CPU load falls below this value, a notification is issued after the duration specified in lowerDur (only for reaction="Warning").
upperDur The CPU load must exceed the upper threshold (upperThresh) for the time defined in upperDur (for example upperDur="10s" for 10 seconds) before the reaction is triggered.
lowerDur The CPU load must remain below the lower threshold (lowerThresh) for the time defined in lowerDur(for example lowerDur="60s" for 60 seconds).
reaction This attribute specifies the action to be taken when the thresholds are crossed. With reaction="Warning", a warning is issued. With reaction="Watchdog" a system watchdog is triggered. 
diagSetName Here you define which DiagnosticSet should be used for the CPULoadEventSubcsiption. A DiagnosticSet defines which log files are to be collected for the subscription. It is specified in the element <DiagnosticSets>.
If you do not want to collect log files, enter diagSetName="disabled". Only warning notifications will be displayed then. See also section Diagnostic_folder.

<DiagnosticSets>

The DiagnosticSets element defines various diagnostic sets that can be activated for certain events.

Attribute Description
DiagnosticSet name Unique name of the DiagnosticSet
collectLogs true: Log files should be collected
false: No log files are collected
ramLoads true: RAM load is monitored
false: RAM load is not monitored
cpuLoads true: CPU load is monitored 
false: CPU load is not monitored
lttng true: lttng tracing is activated
false: lttng tracing is deactivated

Notifications

The warnings issued by the System Monitoring component are displayed as notifications, for example in the WBM of the device (Diagnostics - Notifications). System Monitoring is displayed as the sender of the notification. See also Notifications of PLCnext Runtime.

Diagnostic folders 

For each warning or emergency exit, a diagnostic folder is created in /opt/plcnext/logs/Monitoring/Watchdog. The folder name is composed of the timestamp and the event name. There can be a maximum of 10 directories. The most current ones are retained.

Each folder contains log sets relevant to the event that occurred.

 

 


• Published/reviewed: 2025-07-04  ✿  Revision 081 •