Filtering syslog at source for Microsoft Sentinel

For Microsoft Sentinel a 'syslog forwarder' that acts as a centralisation point for linux system and the Azure Monitor Agent (AMA) forwards messages received to a designated Log Analytics Workspace.  AMA provides the ability to filter logs using KQL queries at source, protentially reducing cost for the eingestion of a large amount of noise.

AMA does have a catch that's in the fine-print of its billing:

https://azure.microsoft.com/en-au/pricing/details/monitor/

"If a transformation reduces the ingested data by more than 50%, you'll be charged for the amount of filtered data above 50%."

In other words, the maximum amount of noise that can be filtered from a raw log (from a cost reduction perspective) is 50%.  This is a big problem for managing Sentinel's syslog ingestion costs as multiple systems are writing to the same tables (called facilities) and we might want to collect different severities of data from each of them depending on their role and technology.  It's highly likley that we would like to filter out much more than 50% of the syslog noise .

AMA does let us filter by severity level on facilities but without that additional granularity to group and manage incoming logs by system we are forced to decide to eiter accept unnecessarility high costs or potentially not collect event information that may be important for high-value systems.

This problem can be overcome because rsyslog allows us to filtering incoming messages at source and direct them to dedicated application specific logs that can be ingested into Sentinel as JSON logs, rather than relying on the "SYSLOG" stream type.

On the syslog forwarder,  you can create an application specific configuration to be used by rsyslog.  Just duplicate the if statement to include different facilities and adjust the severity levels accordingly.

laurie@forwarder:~# vi /etc/rsyslog.d/myapp.conf
laurie@forwarder:~# service rsyslog restart

After the configuration has been created the rsyslog service needs to be restarted to become active,

The template below can be altered to specify the facility and server(s) you want writing to a particular log note the need to change the log location in the omfile action.

Sentinel uses different table names (mostly due to case sensitivity) than the syslog format.  This is adjusted for in the template.

#------------------
# Purpose
#------------------
# Allows for filtering syslog messages by facility and severity 
# Messages are written to a dedicated log file that can be uploaded to Sentinel
#

#------------------
# Templates
#------------------

# Use a template for constructing a UTC date time format for the
# originating message

template(
    name = "Syslog_DateFormat"type = "list") {
    property(name="timestamp" dateformat="year" date.inUTC="on")
      constant(value="-")
    property(name="timestamp" dateformat="month" date.inUTC="on")
      constant(value="-")
    property(name="timestamp" dateformat="day" date.inUTC="on")
      constant(value="T")
    property(name="timestamp" dateformat="hour" date.inUTC="on")
      constant(value=":")
    property(name="timestamp" dateformat="minute" date.inUTC="on")
      constant(value=":")
    property(name="timestamp" dateformat="second" date.inUTC="on")
      constant(value=".")
    property(name="timestamp" dateformat="subseconds" date.inUTC="on")
      constant(value="Z")
}


    # this template formats standard syslog properties as into JSON lines
    template(name="SentinelSyslogFormat" type="list" option.jsonf="on") {
        property(outname="EventTime" name="$!sntdate" format="jsonf")
        property(outname="HostName" name="hostname" format="jsonf")
        property(outname="ProcessID" name="procid" format="jsonf")
        property(outname="ProcessName" name="syslogtag" format="jsonf")
        property(outname="Facility" name="syslogfacility-text" format="jsonf")
        property(outname="SeverityLevel" name="syslogseverity-text" format="jsonf")
        property(outname="SyslogMessage" name="msg" format="jsonf")
    }


     # Construct the GMT date format from the message
     set $!sntdate = exec_template("Syslog_DateFormat");

    # Choose the machine(s) we want logs from.  The if statement
    # can be extended with '-or' statements for multiple servers being part of the same system
    if (( $hostname contains 'server1234')       ) then {

        # Exclude Common Event Format & ASA logs from being written due to their specific formatting
        if not (($rawmsg contains "CEF:") or ($rawmsg contains "%ASA-")) then {

          # I can now filter by facility and severity
          # logs will be written in a format that the Azure Monitor Agent can accept
          # change the output file location below after 'file'

          if ( ($syslogfacility-text == 'auth')  and ($syslogseverity-text == 'alert') ) then {
            action(type="omfile" file="/var/log/myappname-messages.log" template="SentinelSyslogFormat")
          }



       }
    }

Don't forget to set log rotation (logrotate) on a production system so you dont fill up a drive.  Also consider writing logs to a different mount.

The AMA transform used on the collected data is below.  

source | extend d=todynamic(RawData) | project TimeGenerated,  EventTime=todatetime(d.eventtime),  HostName=tostring(d.HostName),  ProcessID=toint(d.ProcessID), ProcessName=tostring(d.ProcessName),  Facility=tostring(d.Facility), SeverityLevel=tostring(d.SeverityLevel), SyslogMessage=tostring(d.SyslogMessage)