Auditd
What is Audit Daemon (auditd)?
auditd is the userspace component to the Linux Auditing System. It's responsible for writing audit records to the disk. Viewing the logs is done with the ausearch or aureport utilities. Configuring the audit system or loading rules is done with the auditctl utility. During startup, the rules in /etc/audit/audit.rules are read by auditctl and loaded into the kernel. Alternately, there is also an augenrules program that reads rules located in /etc/audit/rules.d/ and compiles them into an audit.rules file. The audit daemon itself has some configuration options that the admin may wish to customize. They are found in the auditd.conf file.
EC2 instances with audit daemon running will stop automatically if auditd is unable to write the log files
Why would the audit daemon stop my instance if it can not write logs?
This is mainly a security response. If the system is unable to log actions or movements on the system, then if a compromise happens there would be no way to account for the actions of nefarious actors. Simply put, if auditd can't log anything to disk, no one should be on the system.
To facilitate these actions, there are configurable parameters. In the /etc/auditd.conf there are a few options that can manipulate the actions of the system which could cause a shutdown. The parameters are:
space_left
space_left_action
admin_space_left
admin_space_left_action
disk_full_action
disk_error_action
Below are the definitions for each of the above items according to man 5 auditd.conf:
space_left
This is a numeric value in megabytes that tells the audit daemon when to perform a configurable action because the system is starting to run low on disk space.
space_left_action
This parameter tells the system what action to take when the system has detected that it is starting to get low on disk space. Valid values are ignore, syslog, rotate, email, exec, suspend, single, and halt. If set to ignore, the audit daemon does nothing. syslog means that it will issue a warning to syslog. rotate will rotate logs, losing the oldest to free up space. Email means that it will send a warning to the email account specified in action_mail_acct as well as sending the message to syslog. exec /path-to-script will execute the script. You cannot pass parameters to the script. The script is also responsible for telling the auditd daemon to resume logging once its completed its action. This can be done by adding service auditd resume to the script. suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. The halt option will cause the audit daemon to shutdown the computer system.
admin_space_left
This is a numeric value in megabytes that tells the audit daemon when to perform a configurable action because the system is running low on disk space. This should be considered the last chance to do something before running out of disk space. The numeric value for this parameter should be lower than the number for space_left.
admin_space_left_action
This parameter tells the system what action to take when the system has detected that it is low on disk space. Valid values are ignore, syslog, rotate, email, exec, suspend, single, and halt. If set to ignore, the audit daemon does nothing. Syslog means that it will issue a warning to syslog. rotate will rotate logs, losing the oldest to free up space. Email means that it will send a warning to the email account specified in action_mail_acct as well as sending the message to syslog. exec /path-to-script will execute the script. You cannot pass parameters to the script. The script is also responsible for telling the auditd daemon to resume logging once its completed its action. This can be done by adding service auditd resume to the script. Suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. The halt option will cause the audit daemon to shutdown the computer system.
disk_full_action
This parameter tells the system what action to take when the system has detected that the partition to which log files are written has become full. Valid values are ignore, syslog, rotate, exec, suspend, single, and halt. If set to ignore, the audit daemon will issue a syslog message but no other action is taken. Syslog means that it will issue a warning to syslog. rotate will rotate logs, losing the oldest to free up space. exec /path-to-script will execute the script. You cannot pass parameters to the script. The script is also responsible for telling the auditd daemon to resume logging g once its completed its action. This can be done by adding service auditd resume to the script. Suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. halt option will cause the audit daemon to shutdown the computer system.
disk_error_action
This parameter tells the system what action to take whenever there is an error detected when writing audit events to disk or rotating logs. Valid values are ignore, syslog, exec, suspend, single, and halt. If set to ignore, the audit daemon will not take any action. Syslog means that it will issue no more than 5 consecutive warnings to syslog. exec /path-to-script will execute the script. You cannot pass parameters to the script. Suspend will cause the audit daemon to stop writing records to the disk. The daemon will still be alive. The single option will cause the audit daemon to put the computer system in single user mode. halt option will cause the audit daemon to shutdown the computer system.
By default, on Amazon Linux if the disk has an error or is full, the system is SUSPENDED. Below are unmodified parameters from an ALAMI 2017.09 instance:
disk_error_action = SUSPEND disk_full_action = SUSPEND admin_space_left_action = SUSPEND admin_space_left = 50 space_left_action = SYSLOG space_left = 75
As you can see, auditd is configured to warn via syslog. You can use "email" as the value, however this value is dependent on "action_mail_acct" which is detailed below:
action_mail_acct
This option should contain a valid email address or alias. The default address is root. If the email address is not local to the machine, you must make sure you have email properly configured on your machine and network. Also, this option requires that /usr/lib/sendmail exists on the machine.
Additional Information regarding disk actions:
Allthough the man page mentions that if auditd.conf's disk_full_action and disk_error_action are set to SUSPEND it will still keep the daemon alive and just stop writing to disk, from all indicators, the suspend action does more than that and does include putting the computer into a sleep state. As seen with this message:
[ 16.872478] ACPI: Preparing to enter system sleep state S5
Further messages may also be visible in /var/log/messages regarding the action auditd has taken:
grep auditd /var/log/messages | grep -i "space"
While you can change this behavior in /etc/audit/audit.conf and ignore the disk full or disk error its not the best practice. Best practice would be to set up log rotation and log file size limits to help manage the space in /var/log/audit. In RHEL machines that use LVM, /var/log/audit is usually only given 5GB. If you are using SELinux, this can fill up rather quickly due to the constant AVC denial messages if SELinux is not properly configured/used.
Possible Resolutions:
There are a few things you can do: Rotate logs and limit log size:
auditd can rotate its own logs, but not compress them. RedHat does offer the following information regarding the rotation and compression of such log files. The same may be applied to CentOS and ALAMI.
By default, auditd in all versions of Red Hat Enterprise Linux rotates its own log files automatically when they reach a certain size, as determined by the max_log_file setting in auditd.conf (which defaults to 6 megabytes)
Replacing auto-rotation based on size with auto-rotation based on time 1. Disable rotation in /etc/audit/auditd.conf so that: max_log_file_action = ignore
2. Tell auditd to reconfigure itself (applying your changes) by doing one of the following: kill -HUP $(pidof auditd) (Any version) systemctl reload auditd (RHEL7) service auditd reload (RHEL6 and earlier) 3. To manually trigger auditd to rotate, it needs to receive a USR1 signal Simple solution for daily rotation: copy auditd.cron to cron.daily
~]# cp /usr/share/doc/audit-*/auditd.cron /etc/cron.daily
~]# chmod +x /etc/cron.daily/auditd.cron
~]# cat /etc/cron.daily/auditd.cron
#!/bin/sh ########## # This script can be installed to get a daily log rotation # based on a cron job. ########## /sbin/service auditd rotate EXITVALUE=$? if [ $EXITVALUE != 0 ]; then /usr/bin/logger -t auditd "ALERT exited abnormally with [$EXITVALUE]" fi exit 0
Implementing log compression auditd does not support log compression; however, it's trivial to update the above script to rename old audit.log.n files and compresses them. A working example is provided for demonstration purposes.
1. Follow the steps above to disable auto-rotation based on size 2. Replace the previously-created script with the following code:
#!/bin/bash export PATH=/sbin:/bin:/usr/sbin:/usr/bin FORMAT="%F_%T" # Customize timestamp format as desired, per `man date` # %F_%T will lead to files like: audit.log.2015-02-26_15:43:46 COMPRESS=gzip # Change to bzip2 or xz as desired KEEP=5 # Number of compressed log files to keep rename_and_compress_old_logs() { for file in $(find /var/log/audit/ -name 'audit.log.[0-9]'); do timestamp=$(ls -l --time-style="+${FORMAT}" ${file} | awk '{print $6}') newfile=${file%.[0-9]}.${timestamp} # Optional: remove "-v" verbose flag from next 2 lines to hide output mv -v ${file} ${newfile} ${COMPRESS} -v ${newfile} done } delete_old_compressed_logs() { # Optional: remove "-v" verbose flag to hide output rm -v $(find /var/log/audit/ -regextype posix-extended -regex '.*audit\.log\..*(xz|gz|bz2)$' | sort -n | head -n -${KEEP}) } rename_and_compress_old_logs service auditd rotate rename_and_compress_old_logs
delete_old_compressed_logs
3. Modify the declarations of FORMAT, COMPRESS, and KEEP as desired 4. Ensure the script is marked executable and set it to be called by cron at desired times (either via a normal cron job or by putting it in cron.daily as demonstrated above)
audit: backlog limit exceeded
AWS method
https://repost.aws/knowledge-center/troubleshoot-audit-backlog-errors-ec2
Short description The audit backlog buffer in a Linux system is a kernel level socket buffer queue that the operating system uses to maintain or log audit events. When a new audit event triggers, the system logs the event and adds it to the audit backlog buffer queue.
The backlog_limit parameter value is the number of audit backlog buffers. The parameter is set to 320 by default, as shown in the following example:
# auditctl -s enabled 1 failure 1 pid 2264 rate_limit 0 backlog_limit 320 lost 0 backlog 0
Audit events logged beyond the default number of 320 cause the following errors on the instance:
audit: audit_backlog=321 > audit_backlog_limit=320 audit: audit_lost=44393 audit_rate_limit=0 audit_backlog_limit=320 audit: backlog limit exceeded -or- audit_printk_skb: 153 callbacks suppressed audit_printk_skb: 114 callbacks suppressed
An audit buffer queue at or exceeding capacity might also cause the instance to freeze or remain in an unresponsive state.
To avoid backlog limit exceeded errors, increase the backlog_limit parameter value. Large servers have a larger number of audit logs triggered, so increasing buffer space helps avoid error messages.
Note: Increasing the audit buffer consumes more of the instance's memory. How large you make the backlog_limit parameter depends on the total memory of the instance. If the system has enough memory, you can try doubling the existing backlog_limit parameter value.
The following is a calculation of the memory required for the auditd backlog. Use this calculation to determine how large you can make the backlog queue without causing memory stress on your instance.
One audit buffer = 8970 Bytes Default number of audit buffers (backlog_limit parameter) = 320 320 * 8970 = 2870400 Bytes, or 2.7 MiB
The size of the audit buffer is defined by the MAX_AUDIT_MESSAGE_LENGTH parameter. For more information, see MAX_AUDIT_MESSAGE_LENGTH in the Linux audit library on github.com.
Note: If your instance is inaccessible and you see backlog limit exceeded messages in the system log, stop and start the instance. Then, perform the following steps to change the audit buffer value.
Resolution Note: In this example, we're changing the backlog_limit parameter value to 8192 buffers. 8192 buffers equals 70 MiB of memory based on the preceding calculation. You can use any value based on your memory calculation.
Access the instance using SSH.
Verify the current audit buffer size.
Note: The backlog_limit parameter is listed as -b. For more information, see auditctl(8) on the auditctl-man-page
Amazon Linux 1 and other operating systems that don't have systemd:
$ sudo cat /etc/audit/audit.rules # This file contains the auditctl rules that are loaded # whenever the audit daemon is started via the initscripts. # The rules are simply the parameters that would be passed # to auditctl. # First rule - delete all -D # Increase the buffers to survive stress events. # Make this bigger for busy systems -b 320 # Disable system call auditing. # Remove the following line if you need the auditing. -a never,task # Feel free to add below this line. See auditctl man page
Amazon Linux 2 and other operating systems that use systemd:
$ sudo cat /etc/audit/audit.rules # This file is automatically generated from /etc/audit/rules.d -D -b 320 -f 1
Access the audit.rules file using an editor, such as the vi editor: Amazon Linux 1 and other operating systems that don't use systemd:
$ sudo vi /etc/audit/audit.rules
Amazon Linux 2 and other operating systems that use systemd:
$ sudo vi /etc/audit/rules.d/audit.rules
Edit the -b parameter to a larger value. The following example changes the -b value to 8192.
$ sudo cat /etc/audit/audit.rules # This file contains the auditctl rules that are loaded # whenever the audit daemon is started via the initscripts. # The rules are simply the parameters that would be passed # to auditctl. # First rule - delete all -D # Increase the buffers to survive stress events. # Make this bigger for busy systems -b 8192 # Disable system call auditing. # Remove the following line if you need the auditing. -a never,task # Feel free to add below this line. See auditctl man page $ sudo auditctl -s enabled 1 failure 1 pid 2264 rate_limit 0 backlog_limit 320 lost 0 backlog 0
Restart the auditd service. The new backlog_limit value takes effect. The value also updates in auditctl -s, as shown in the following example:
# sudo service auditd stop Stopping auditd: [ OK ] # sudo service auditd start Starting auditd: [ OK ] # auditctl -s enabled 1 failure 1 pid 26823 rate_limit 0 backlog_limit 8192 lost 0 backlog 0
Note: If your instance is inaccessible and you see backlog limit exceeded messages in the system log, stop and start the instance. Then, perform the preceding steps to change the audit buffer value.
The other method if auditd is enabled via the GRUB kernel parameter
In my experience of imaging a RHEL86 system that has the following kernel parameter in GRUB:
audit=1 audit_backlog_limit=8192
You can disable it by pressing 'e' in the grub menu to edit the grub kernel parameter line. Modify the following line like so:
audit=0
Or you can modify the audit_backlog_limit like so (however, keep in mind how much memory your system has):
audit=1 audit_backlog_limit=16384
References:
https://access.redhat.com/solutions/4353521
Misc.
Additional information:
There has been known issued that could cause the kernel can panic due to audit option "f" in /etc/audit/audit.rules cat /etc/audit/audit.rule This would not cause the system to stop automatically and may be a different issue:
Example:
- This file is automatically generated from /etc/audit/rules.d
-D
-b 8192
-f 1
The f flag sets the action that is performed when a critical error is detected, 0 -- Silent 1 -- Means that error will be handled by kernel log subsystem (printk, print a failure message) 2 -- Kernel panic in case of critical error Example conditions where this flag is consulted includes: transmission errors to user-space audit daemon, backlog limit exceeded, and rate limit exceeded.
Just wanted to add I have faced the error in the past where kernel got into panic due to audit option "f" in /etc/audit/audit.rules
cat /etc/audit/audit.rules
- This file is automatically generated from /etc/audit/rules.d
-D -b 8192 -f 1 f - Sets the action that is performed when a critical error is detected,
0 -- Silent 1 -- Means that error will be handled by kernel log subsystem (printk, print a failure message) 2 -- Kernel panic in case of critical error
Example conditions where this flag is consulted includes: transmission errors to user-space audit daemon, backlog limit exceeded, and rate limit exceeded
Notes:
Per my customer's issue, we fixed it by:
Removing the auditd.service from /usr/lib/systemd/system so you wont be able to start the service upon bootup.
To be able to start the service again you have to run: "systemctl daemon-reload” in addition to having the <>.service file in that directory
$ sudo systemctl start firewalld.service Failed to start firewalld.service: Unit not found. $ sudo systemctl daemon-reload $ sudo systemctl start firewalld.service
References:
https://superuser.com/questions/513159/how-to-remove-systemd-services