Sending Windows DNS Server logs to Azure Data Explorer

ADX ASIM DNS logs

DNS logs are a critical resource for a SOC team.  They provide forensics for understanding what has been done within an environment and they can be an early warning system for identifying malware and bad actors in the environment.

The down-side of collecting DNS resolution events is that they are extremely noisy and as a result the costs can be astronomic.  Sentinel's standard method for DNS record collection is using the Azure AMA Agent.  In theory, you can use transsformation rules to drop a lot of the noise but there is an overlooked advisory with Microsoft's charging... 

AMA Transformation Cost

As DNS resolution events are enormous anyway, knowing that you will be charged for 50% of that data no matter what your filtering rules are makes it difficult to justify the cost of collecting the events.

The answer for collecting and storing large amounts of data is always Azure Data Explorer (ADX).

Technology limitations of collecting Windows DNS resolution logs

I have been surprised at the difficulty of collecting what Security Operations teams consider to be a critical asset.  I've been forced to the realisation that unles SecOps teams are using Splunk, DNS logs probably aren't being collected at all.

We can't enable Windows DNS Server tace logging to file and hope to use file parsing for obtaining these records.  Trace Logging prevents file reading when running and the amount of disk I/O used in streaming to file would be problematic with DNS servers running on Domain Controllers.  The way these lookups must be collected is using Microsoft's Microsoft-Windows-DNSServer Event Tracing for Windows (ETW) Provider "{EB79061A-A566-4698-9119-3ED2807060E7}" directly.

Microsoft's approach

Microsoft only support the collection of DNS logs using a custom extension with the AMA Agent and sending that data to Sentinel.  This uses Event Tracing for Windows (ETW).  The filtering of DNS lookups through transformation still sets an absolute charging floor of 50% of raw data.  The AMA Agent's "direct to storage" capability for sending logs to Event Hubs does not interoperate with the AMA DNS extension.  The volume of DNS events and cost of Sentinel ingestion ($10 a GB in Australia) makes this a an impossible consideration.

Logstash option

Logstash is a widely used option for getting any type of logs.  @Koos Goossens has written extensively on Azure Data Explorer and using Logstash as a data collector: https://koosg.medium.com/ingest-your-logs-into-azure-data-explorer-with-logstash-5434611e599f.

I have concerns with Logstash as my event collector.  The installer for a Windows Logstash client is nearly 700mb and is Java based.  From a Security perspective, I don't like the enormous attack surface area that comes from having a dependency on Java Runtime and I have a long memory of trying to get dozens of vulnerable JRE versions out of the enterprise environment.  From an Operations perspective I don't like the heavy resource footprint Java has on machines, especially domain controllers.  I'm also aware of the potential difficulties in trying to keep Java current.  I just don't think that Java and Ruby are good technology options for event monitoring.

OpenTelemetry collector option

For dealing with such high volume events, I feel that we have better log client technology choices with Golang.  

  • Golang applications have a much smaller memory and CPU footprint than most other development technologies.  
  • Golang's goroutines excel at handling high-volume DNS traffic with minimal overhead.  
  • Golang compiles to a single, self-contained executable with no external dependencies.
  • Golang will not allow you to compile unused routines into an executable, which means a low attack surface exposure for security. 

OpenTelemetry's Golang source code is open source, allowing for custom collectors to be developed for the enterprise.

ASIM DNS Collector

Although I have not used this in production, I'd like to share an OpenTelemetry based DNS collector for Windows servers.  The source code can be found at:

https://github.com/LaurieRhodes/asim-dns-collector

This source code produces the combiled 'bin\asim-dns-collector' file that needs to be renamed as an ".exe".  Detailed documentation is available within the project.

Technical Architecture

The ASIM DNS Collector is built on a multi-layered architecture designed for enterprise deployment:

  1. Core Collection Layer: Interfaces directly with Windows ETW subsystem to capture DNS Server events from provider GUID {EB79061A-A566-4698-9119-3ED2807060E7} without the overhead of intermediate APIs or agents.
  2. Transformation Layer: Implements comprehensive field mapping from Windows-proprietary event format to standardized Microsoft ASIM schema with full type conversion and normalisation.
  3. Filtering Engine: Employs a pipeline-based filtering architecture with multiple stages (event type filtering, domain pattern matching, query deduplication, and query type filtering) to achieve high-throughput event reduction.
  4. Transport Layer: Utilises the Kafka protocol for secure, authenticated transmission to Azure Event Hubs with configurable retry policies and TLS enforcement.

This architecture achieves significant technical advantages over AMA-based collection:

  • Memory Utilisation: Peak memory consumption remains under 50MB even under high event load (10,000+ events/second) due to Golang's efficient memory management and zero-allocation event processing
  • CPU Footprint: Typically less than 2% CPU utilisation on a standard domain controller
  • I/O Performance: Non-blocking I/O operations ensure minimal impact on DNS server disk and network performance
  • Deployment Flexibility: Single binary deployment eliminates dependency management challenges in production environments

Implementation Details

The collector employs several advanced implementations worth noting:

ETW Session Management

Rather than using higher-level Windows APIs, the collector interfaces directly with ETW session management APIs. This approach allows for precise control over buffer configurations, minimizing the risk of event loss during high-volume periods. The implementation includes automatic session cleanup on abnormal termination, preventing orphaned ETW sessions that commonly plague other collectors.

ASIM Schema Compliance

The transformation layer implements the complete ASIM DNS schema specification, including proper handling of complex fields like:

  • DNS Flags: Full parsing and normalization of DNS protocol flags (RD, CD, AA, AD) into both individual boolean fields and standardized flag strings
  • Event Correlation: Generation of deterministic session IDs to enable cross-event correlation in downstream analytics
  • Time Normalization: Proper conversion between Windows high-precision timestamps and ISO-8601/RFC3339 formats required by ASIM

Production-Grade Filtering

The filtering engine implements pattern-matching using optimised regex compilation with O(1) lookup performance.   This means that when the collector checks if a domain should be filtered, the operation takes the same amount of time whether you're filtering 10 domains or 10,000 domains.

Domain pattern matching uses a trie-based implementation for efficient wildcard handling, allowing filtering of millions of events with negligible CPU impact.

Azure Data Explorer Setup

An example of a base ADX project to support ASIM DNS events can be found at: https://github.com/LaurieRhodes/PUBLIC-adx-basic

An example of how to enable Event integration to ADX can be found with the blog post:  Adding data streams to Azure Data Explorer.  The KQL required to expand incoming event messages is referenced below.

  


.create-or-alter table ASimDnsActivityLogsRaw ingestion json mapping 'ASimDnsActivityLogsRawMapping' '[{"column":"records","Properties":{"path":"$"}}]'

.create-or-alter function ASimDnsActivityLogsExpand() {
ASimDnsActivityLogsRaw
| extend parsedRecord = parse_json(records)  
| mv-expand resourceLog = parsedRecord.resourceLogs
| mv-expand scopeLog = resourceLog.scopeLogs
| mv-expand logRecord = scopeLog.logRecords
| mv-expand attributes = logRecord.attributes
| evaluate bag_unpack(attributes)
| extend 
    AttributeName = tostring(key),
    AttributeValue = case(
        key == "EventCount" or key == "DnsQueryType" or key == "DstPortNumber", tostring(value.intValue),
        key == "DnsFlagsRecursionDesired" or key == "DnsFlagsCheckingDisabled", tostring(value.boolValue),
        tostring(value.stringValue)
    )
| summarize AttributeBag = make_bag(pack(AttributeName, AttributeValue)) by TimeGenerated = unixtime_microseconds_todatetime(tolong(logRecord.timeUnixNano) / 1000)
| project
    Tenantid=tostring('00000000-0000-0000-0000-000000000000'),
    TimeGenerated,
    EventCount = toint(AttributeBag.EventCount),    
    EventType = tostring(AttributeBag.EventType),
    EventSubType = tostring(AttributeBag.EventSubType),
    EventResult = tostring(AttributeBag.EventResult),
    EventResultDetails = tostring(AttributeBag.EventResultDetails),
    EventOriginalType = tostring(AttributeBag.EventOriginalType),
    EventProduct = tostring(AttributeBag.EventProduct),
    EventVendor = tostring(AttributeBag.EventVendor),
    DvcIpAddr = tostring(AttributeBag.DvcIpAddr),
    DvcHostname = tostring(AttributeBag.DvcHostname),
    DvcDomain = tostring(AttributeBag.DvcDomain),
    DvcDomainType = tostring(AttributeBag.DvcDomainType),
    DvcOs = tostring(AttributeBag.DvcOs),
    DvcOsVersion = tostring(AttributeBag.DvcOsVersion),
    AdditionalFields = todynamic(AttributeBag.AdditionalFields),
    SrcIpAddr = tostring(AttributeBag.SrcIpAddr),
    SrcPortNumber = toint(AttributeBag.SrcPortNumber),    
    SrcGeoCountry = tostring(''),   
    SrcGeoRegion = tostring(''),   
    SrcGeoCity = tostring(''),   
    SrcGeoLatitude = toreal(''),   
    SrcGeoLongitude = toreal(''),   
    DstIpAddr =tostring(''),
    DstGeoCountry = tostring(''),   
    DstGeoRegion = tostring(''),   
    DstGeoCity = tostring(''),  
    DstGeoLatitude = toreal(''),   
    DstGeoLongitude = toreal(''),            
    DnsQuery = tostring(AttributeBag.DnsQuery),
    DnsQueryType = toint(AttributeBag.DnsQueryType),
    DnsQueryTypeName = tostring(AttributeBag.DnsQueryTypeName),
    DnsResponseCode = toint(''),
    DnsResponseName = tostring(''), 
    TransactionIdHex = tostring(''), 
    DstDescription = tostring(''), 
    DstDvcScope = tostring(''), 
    DstOriginalRiskLevel = tostring(''), 
    DstRiskLevel = toint(''), 
    DvcDescription = tostring(''), 
    DvcInterface = tostring(''), 
    DvcOriginalAction = tostring(''), 
    DvcScope = tostring(''), 
    DvcScopeId = tostring(''), 
    EventOriginalSeverity = tostring(''), 
    NetworkProtocolVersion = tostring(''), 
    RuleName = tostring(''),
    RuleNumber = toint(''), 
    DnsResponseIpCountry = tostring(''), 
    DnsResponseIpLatitude = toreal(''), 
    DnsResponseIpLongitude = toreal(''), 
    NetworkProtocol = tostring(AttributeBag.NetworkProtocol),
    DnsQueryClass = toint(''), 
    DnsQueryClassName = tostring(''), 
    DnsNetworkDuration = toint(''), 
    DnsFlagsAuthenticated = tobool(''), 
    DnsFlagsAuthoritative = tobool(''), 
    DnsFlagsRecursionDesired = tobool(AttributeBag.DnsFlagsRecursionDesired), 
    DnsSessionId = tostring(AttributeBag.DnsSessionId), 
    SrcDescription = tostring(''), 
    SrcDvcScope = tostring(''), 
    SrcDvcScopeId = tostring(''), 
    SrcOriginalRiskLevel = tostring(''), 
    SrcUserScope = tostring(''), 
    SrcUserScopeId = tostring(''), 
    SrcUserSessionId = tostring(''), 
    ThreatId = tostring(''), 
    ThreatIpAddr = tostring(''), 
    ThreatField = tostring(''), 
    UrlCategory = tostring(''), 
    ThreatCategory = tostring(''), 
    ThreatName = tostring(''), 
    ThreatConfidence = toint(''), 
    ThreatOriginalConfidence = tostring(''), 
    ThreatRiskLevel = toint(''), 
    ThreatOriginalRiskLevel_s = tostring(''), 
    ThreatOriginalRiskLevel = toint(''), 
    ThreatIsActive = tobool(''), 
    ThreatFirstReportedTime = tostring(''), 
    ThreatFirstReportedTime_d = todatetime(''), 
    ThreatLastReportedTime = tostring(''), 
    ThreatLastReportedTime_d = todatetime(''), 
    EventStartTime = todatetime(''), 
    EventEndTime = todatetime(''), 
    EventMessage = tostring(''), 
    EventOriginalUid = tostring(''), 
    EventReportUrl = tostring(''), 
    EventSchemaVersion = tostring(''), 
    Dvc = tostring(AttributeBag.Dvc),
    DvcFQDN = tostring(''), 
    DvcId = tostring(''), 
    DvcIdType = tostring(''), 
    DvcMacAddr = tostring(''), 
    DvcZone = tostring(''), 
    DnsResponseIpCity = tostring(''), 
    DnsResponseIpRegion = tostring(''), 
    EventOwner = tostring(''), 
    EventProductVersion = tostring(''), 
    EventSeverity = tostring(''), 
    Src = tostring(''), 
    SrcHostname = tostring(''), 
    SrcDomain = tostring(''), 
    SrcDomainType = tostring(''), 
    SrcFQDN = tostring(''), 
    SrcDvcId = tostring(''), 
    SrcDvcIdType = tostring(''), 
    SrcDeviceType = tostring(''), 
    SrcRiskLevel = toint(''), 
    SrcUserId = tostring(''), 
    SrcUserIdType = tostring(''), 
    SrcUsername = tostring(''), 
    SrcUsernameType = tostring(''), 
    SrcUserType = tostring(''), 
    SrcOriginalUserType = tostring(''), 
    SrcProcessName = tostring(''), 
    SrcProcessId = tostring(AttributeBag.SrcProcessId),
    SrcProcessGuid = tostring(''), 
    Dst = tostring(''), 
    DstPortNumber = toint(AttributeBag.DstPortNumber),
    DstHostname = tostring(''), 
    DstDomain = tostring(''), 
    DstDomainType = tostring(''), 
    DstFQDN = tostring(''), 
    DstDvcId = tostring(''), 
    DstDvcScopeId = tostring(''), 
    DstDvcIdType = tostring(''), 
    DstDeviceType = tostring(''), 
    DvcAction = tostring(''), 
    DnsFlags = tostring(AttributeBag.DnsFlags),
    DnsFlagsCheckingDisabled = tobool(AttributeBag.DnsFlagsCheckingDisabled),
    DnsFlagsRecursionAvailable = tobool(''), 
    DnsFlagsTruncated = tobool(''), 
    DnsFlagsZ = tobool(''), 
    SourceSystem = tostring(''), 
    Type = tostring(''), 
    _ItemId = tostring(''), 
    _ResourceId = tostring(''), 
    _SubscriptionId = tostring(''), 
    _TimeReceived = todatetime(now())
}

.alter table ASimDnsActivityLogs policy update @'[{"Source": "ASimDnsActivityLogsRaw", "Query": "ASimDnsActivityLogsExpand()", "IsEnabled": "False", "IsTransactional": true}]'

.alter table ASimDnsActivityLogs policy update @'[{"Source": "ASimDnsActivityLogsRaw", "Query": "ASimDnsActivityLogsExpand()", "IsEnabled": "True", "IsTransactional": true}]'