How to write Sigma rules for SOC analysts
What you'll need before starting
pip install pysigma pysigma-backend-splunk for Python 3.8+. Legacy sigmac (deprecated but still widely used) lives at github.com/SigmaHQ/sigma in the tools/ directory.
- Access to SigmaHQ rule repository: Clone github.com/SigmaHQ/sigma to study 3,000+ community rules. You'll copy structural patterns from existing rules rather than memorizing the spec.
- Basic understanding of your log source schema: Know the field names your SIEM uses—does your Windows Security log call it EventID or event_id? Does your firewall log use src_ip or source.address?
In our HSR Layout lab we provision each SOC batch with a pre-configured Elastic stack containing 14 days of simulated enterprise traffic (benign + MITRE ATT&CK techniques). Students write 8–12 Sigma rules during the detection engineering module and test them against this dataset before deploying to our partner SOCs during the 4-month paid internship.Step-by-step: Writing your first Sigma rule to detect suspicious PowerShell execution
powershell_encoded_network.yml in your working directory.
2. Add the title and ID block at the top:
title: Suspicious PowerShell Encoded Command with Network Activity
id: a3f8b2c1-4d5e-6f7a-8b9c-0d1e2f3a4b5c
Generate a random UUID at uuidgenerator.net—never reuse IDs from other rules.
3. Add status and description:
status: experimental
description: Detects PowerShell execution with Base64-encoded commands and outbound network connections, often used by malware droppers and C2 frameworks.
Status values: stable (production-ready), test (needs validation), experimental (new rule), deprecated.
4. Specify references to threat intel:
references:
- https://attack.mitre.org/techniques/T1059/001/
- https://www.microsoft.com/security/blog/2020/03/threat-hunting-powershell
5. Add author and date:
author: Networkers Home SOC Lab
date: 2025-01-15
modified: 2025-01-15
6. Map to MITRE ATT&CK:
tags:
- attack.execution
- attack.t1059.001
- attack.command_and_control
- attack.t1071.001
7. Define the log source:
logsource:
category: process_creation
product: windows
Common categories: process_creation, network_connection, registry_event, file_event, dns_query.
8. Write the detection selection block:
detection:
selection_img:
Image|endswith: '\\powershell.exe'
selection_encoded:
CommandLine|contains:
- ' -enc '
- ' -EncodedCommand '
- ' -e '
selection_network:
CommandLine|contains:
- 'Net.WebClient'
- 'DownloadString'
- 'Invoke-WebRequest'
- 'iwr '
- 'curl '
The pipe | applies field modifiers: endswith, contains, startswith, all, base64offset.
9. Add a filter to reduce false positives:
filter_legit:
ParentImage|endswith:
- '\\explorer.exe'
- '\\cmd.exe'
CommandLine|contains: 'WindowsUpdate'
10. Write the condition logic:
condition: selection_img and selection_encoded and selection_network and not filter_legit
Condition operators: and, or, not, 1 of selection_*, all of filter_*.
11. Add false positive notes:
falsepositives:
- Legitimate admin scripts using encoded commands for patch management
- SCCM or Intune deployment scripts
level: high
Levels: informational, low, medium, high, critical.
12. Save and validate with pySigma:
sigma convert -t splunk -p sysmon powershell_encoded_network.yml
This outputs a Splunk SPL query you can paste into your SIEM search bar.
Your complete rule is now 40 lines of YAML that will convert to queries for 15+ SIEM platforms. In our lab we've seen this exact rule catch Emotet droppers, Cobalt Strike stagers, and credential-dumping scripts in partner SOC environments.How to verify your Sigma rule works correctly
sigma check powershell_encoded_network.yml
This catches YAML indentation errors, invalid field modifiers, and malformed conditions. Common errors: mixing tabs and spaces (use spaces only), missing colon after field names, unquoted strings containing special characters.
Step 2: Convert to your SIEM's query language sigma convert -t splunk -p sysmon powershell_encoded_network.yml -o test_query.spl
Replace splunk with your backend: elasticsearch, qradar, azure-sentinel, chronicle. The -p sysmon flag applies Sysmon field mappings (Event ID 1 = process creation). Open test_query.spl and verify the output looks correct—field names should match your SIEM schema.
Step 3: Run against known-good and known-bad samples
In Splunk:
index=windows earliest=-7d [paste your converted query here]
| table _time ComputerName Image CommandLine ParentImage
You should see:
- True positives: Malicious PowerShell from your red team exercise or malware sandbox
- Zero false positives: Legitimate admin scripts should NOT appear (if they do, tighten your filters)
In Elastic:
GET windows-logs-*/_search
{
"query": { [paste converted query] },
"size": 100,
"sort": [{"@timestamp": "desc"}]
}
Step 4: Calculate detection rate
If you have 10 known-malicious samples and your rule catches 8, your detection rate is 80%. Industry standard for production rules is ≥95% detection with <2% false positive rate. In our HSR Layout lab we require students to achieve 90%+ detection on the MITRE ATT&CK evaluation dataset before marking a rule production-ready.
Step 5: Peer review with SigmaHQ validator
Submit your rule to the online validator at uncoder.io or run:
sigma check --validation-config sigmahq powershell_encoded_network.yml
This enforces SigmaHQ quality standards: mandatory fields present, MITRE tags valid, references reachable, no hardcoded IP addresses or usernames.Common errors and how to fix them
ValueError: Unknown modifier 'regex'
Cause: You wrote CommandLine|regex: '.*malware.*' but the regex modifier doesn't exist in Sigma spec.
Fix: Use contains, startswith, endswith, or re (full regex) instead:
CommandLine|re: '^.*malware.*$'
Or better, avoid regex entirely—contains: 'malware' is faster and works across all backends.
Error 2: Rule triggers on every event (100% false positive rate)
Symptom: Your rule matches thousands of benign events per hour.
Cause: Your condition is too broad—likely using or when you meant and, or missing a critical selection field.
Fix: Add specificity. Instead of:
detection:
selection:
CommandLine|contains: '.exe'
condition: selection
Write:
detection:
selection_process:
Image|endswith: '\\powershell.exe'
selection_suspicious:
CommandLine|contains:
- 'Invoke-Mimikatz'
- 'Invoke-ReflectivePEInjection'
condition: selection_process and selection_suspicious
Error 3: "Field 'EventID' not found in log source"
Symptom: Converted query returns zero results even though events exist.
Cause: Field name mismatch between your rule and SIEM schema. Windows Event Logs use EventID in Sysmon but event.code in Winlogbeat, EventCode in Splunk Windows TA.
Fix: Use Sigma's field mapping pipelines. Specify the correct pipeline in conversion:
sigma convert -t elasticsearch -p ecs_windows powershell_encoded_network.yml
The -p ecs_windows flag maps generic Sigma fields to Elastic Common Schema field names.
Error 4: Rule never triggers despite malicious activity
Symptom: You know malware executed but your rule shows zero hits.
Cause: Your log source isn't generating the events you're trying to detect, or logging verbosity is too low.
Fix: Verify the log source first:
index=windows EventID=1 earliest=-1h | stats count by Image
If you see zero Sysmon Event ID 1 (process creation) events, your endpoint isn't forwarding Sysmon logs. Check your log shipper config (Winlogbeat, Splunk UF, NXLog). In our lab we've debugged 30+ "rule doesn't work" tickets that were actually log ingestion failures.
Error 5: Conversion produces syntactically invalid SIEM query
Symptom: Pasted query throws syntax error in Splunk/Elastic.
Cause: Backend converter bug or unsupported field modifier for that platform.
Fix: Manually adjust the converted query. Example—pySigma converts all modifier to Splunk mvcount() which breaks on single-value fields. Replace:
| where mvcount(CommandLine) > 0
With:
| where isnotnull(CommandLine)
Report converter bugs to the pySigma GitHub repo with your rule and error message.How SOC analysts at our internship hosts use Sigma in production
CommandLine|contains: 'a' type patterns.)
- MITRE mapping accuracy: Is T1112 (Modify Registry) the correct technique or should it be T1547.001 (Boot/Logon Autostart)?
At Akamai India's Bengaluru SOC, the peer review SLA is 48 hours. Rules that pass move to Stage 3.
Stage 3: Canary deployment (Week 3–4)
The rule deploys to a "canary" SIEM instance monitoring 5% of production traffic. Analysts watch for:
- Alert volume: Are we generating 2 alerts/day (good) or 200/hour (false positive storm)?
- Triage time: Can a Tier 1 analyst determine true/false positive in under 5 minutes with the context provided?
- Detection latency: Time from malicious event to alert—target is <3 minutes for critical rules.
One of our 2024 batch students caught a Qakbot infection during canary testing at HCL's Noida SOC that had evaded the legacy SIEM rules for 6 days. The Sigma rule detected the malware's use of regsvr32.exe with network callbacks—a pattern the old rules missed because they only checked for .dll file extensions, not the process behavior.
Stage 4: Production rollout and tuning (Week 4+)
After 7 days of canary monitoring with <1% false positive rate, the rule promotes to production across all customer tenants. But the work isn't done—analysts tune the rule monthly based on:
- New false positive patterns: A software vendor starts using similar PowerShell patterns for legitimate updates—add them to the filter block.
- Evasion attempts: Attackers change their tooling to avoid detection—expand the selection criteria.
- Performance optimization: The query times out during peak hours—add index-time field extraction or move expensive regex to post-processing.
In our internship program, each analyst maintains a "rule portfolio" of 15–25 Sigma rules they authored or significantly tuned. This portfolio becomes the centerpiece of their job interviews—hiring managers at Cisco India, Wipro Cybersecurity, and IBM X-Force ask candidates to walk through one rule end-to-end, explaining threat model, detection logic, false positive handling, and production metrics.Understanding Sigma rule structure: The five mandatory sections
id field must be a UUID that never changes—even if you rename the rule, the ID stays constant so SIEM platforms can track rule versions across updates. The status field tells automation whether to deploy the rule: stable rules auto-deploy to production, experimental rules stay in dev/test environments, deprecated rules trigger removal workflows.
Example:
title: Credential Dumping via Mimikatz
id: 7f8b3c2a-1d4e-5f6a-9b8c-2d3e4f5a6b7c
status: stable
description: Detects execution of Mimikatz credential dumping tool based on command-line patterns and in-memory PE injection indicators.
Section 2: Threat intelligence context (references, author, date, tags)
This section maps your rule to external knowledge bases. The references array should include:
- MITRE ATT&CK technique URL (mandatory for enterprise SOCs)
- Original threat intel report or malware analysis
- Vendor security advisory if detecting a specific CVE
The tags array uses a controlled vocabulary: attack.{tactic} for MITRE tactics (execution, persistence, defense_evasion, etc.) and attack.t{id} for technique IDs. Many SIEM platforms auto-populate dashboards and threat hunting workflows from these tags.
Example:
references:
- https://attack.mitre.org/techniques/T1003/001/
- https://www.cert-in.org.in/PDF/CIAD-2023-0045.pdf
author: Networkers Home SOC Lab
date: 2025-01-10
tags:
- attack.credential_access
- attack.t1003.001
Section 3: Log source specification (logsource)
The logsource block tells Sigma converters which log type contains the events you're hunting. It has three sub-fields:
- category: Generic event type (process_creation, network_connection, file_event, registry_event, dns_query, web_proxy)
- product: Vendor/OS (windows, linux, azure, aws, gcp, cisco, palo_alto)
- service: Specific log source (sysmon, security, powershell, firewall, cloudtrail)
Example:
logsource:
category: process_creation
product: windows
service: sysmon
This tells converters: "Look for Sysmon Event ID 1 (process creation) in Windows logs." The converter applies field mappings—Sysmon's Image field becomes process.executable in ECS, process_name in Splunk CIM.
Section 4: Detection logic (detection)
The detection block is where you define what suspicious looks like. It contains:
- Selection blocks: Named groups of field/value pairs that must match. Use descriptive names like selection_mimikatz_cli, selection_network_callback, selection_registry_persistence.
- Filter blocks: Named groups that exclude known-good patterns. Prefix with filter_ by convention.
- Condition statement: Boolean logic combining selections and filters.
Example:
detection:
selection_process:
Image|endswith: '\\lsass.exe'
selection_access:
GrantedAccess: '0x1010'
filter_system:
User|startswith: 'NT AUTHORITY\\'
condition: selection_process and selection_access and not filter_system
The condition selection_process and selection_access and not filter_system means: "Alert if a process accesses lsass.exe with 0x1010 permissions UNLESS the user is a system account."
Section 5: Operational metadata (falsepositives, level)
The falsepositives array documents known benign scenarios that trigger the rule. This guides Tier 1 analysts during triage—if the alert matches a documented false positive, they can close it without escalation. The level field sets alert severity: critical (active breach, page on-call), high (investigate within 1 hour), medium (investigate within 8 hours), low (investigate within 24 hours), informational (log only, no alert).
Example:
falsepositives:
- Backup software accessing lsass.exe for VSS snapshots
- Antivirus performing memory scans
- Windows Defender credential guard initialization
level: high
In our HSR Layout lab we teach students to write the falsepositives array BEFORE writing the detection logic—this forces you to think about edge cases upfront rather than discovering them in production at 2 AM.Advanced Sigma techniques: Correlation rules and aggregation conditions
detection:
selection:
EventID: 4625
Status: '0xC000006D'
condition: selection | count(TargetUserName) by SourceIP > 10
timeframe: 5m
This triggers if a single source IP generates 10+ failed login attempts (Event ID 4625) within 5 minutes. The timeframe field accepts: 15s, 5m, 1h, 24h.
Technique 2: Near aggregation (events within time proximity)
Detect when two different event types occur close together. Example—detect privilege escalation followed by lateral movement:
detection:
selection_privesc:
EventID: 4672
PrivilegeList|contains: 'SeDebugPrivilege'
selection_lateral:
EventID: 4648
TargetServerName|contains: '\\\\'
condition: selection_privesc | near selection_lateral
timeframe: 2m
This triggers if a user gains SeDebugPrivilege (often via exploit) and then initiates a remote logon within 2 minutes—classic pass-the-hash behavior.
Technique 3: Ordered event sequences
Detect multi-stage attacks where events must occur in a specific order. Example—ransomware kill chain:
detection:
stage1_recon:
Image|endswith: '\\net.exe'
CommandLine|contains: 'group "Domain Admins"'
stage2_disable_defense:
Image|endswith: '\\powershell.exe'
CommandLine|contains: 'Set-MpPreference -DisableRealtimeMonitoring'
stage3_encrypt:
Image|endswith: '\\vssadmin.exe'
CommandLine|contains: 'delete shadows'
condition: stage1_recon followed by stage2_disable_defense followed by stage3_encrypt
timeframe: 30m
The followed by operator enforces temporal ordering—stage2 must occur after stage1, stage3 after stage2, all within 30 minutes.
Backend support limitations
Not all SIEM platforms support these advanced features:
- Splunk: Full support via stats, transaction, and streamstats commands
- Elastic: Requires EQL (Event Query Language) backend, not standard Lucene
- QRadar: Supports via AQL FOLLOWEDBY operator
- Microsoft Sentinel: Requires KQL make-series or join operators
- Graylog: Limited—only basic count aggregation
When writing correlation rules, test conversion to your target platform early. In our lab we've seen students write beautiful 5-stage correlation rules that convert to 200-line Splunk queries with 45-second execution time—unusable in production. The rule was technically correct but operationally infeasible.
Performance optimization for correlation rules
Correlation rules are expensive—they hold event state in memory and perform joins across time windows. Three techniques keep them fast:
1. Pre-filter aggressively: Add a selection block that reduces the event set by 90%+ before aggregation. Instead of condition: selection | count() > 10, write condition: selection_rare_event and selection_suspicious | count() > 10.
2. Use narrow time windows: A 5-minute window scans 1/12 the data of a 1-hour window. Only use wide windows when the attack pattern genuinely spans hours.
3. Limit cardinality in by clauses: count() by SourceIP is fast if you have 5,000 unique IPs. count() by CommandLine is slow if you have 500,000 unique command lines. Group by low-cardinality fields (IP, username, hostname) not high-cardinality fields (full command line, URL, file hash).
At Akamai India's SOC, correlation rules must complete in <10 seconds on 24 hours of data or they're rejected. Students learn to benchmark rules with the | timechart span=1s count pattern in Splunk to visualize execution time.Deploying Sigma rules in enterprise SOC workflows: CI/CD and version control
sigma-rules/
├── rules/
│ ├── windows/
│ │ ├── process_creation/
│ │ │ ├── mimikatz_detection.yml
│ │ │ ├── powershell_encoded_command.yml
│ │ ├── registry/
│ │ ├── network/
│ ├── linux/
│ ├── cloud/
├── pipelines/
│ ├── sysmon_windows.yml
│ ├── ecs_linux.yml
├── tests/
│ ├── test_mimikatz_detection.py
│ ├── sample_logs/
├── .gitlab-ci.yml
├── README.md
Rules organize by OS and log category. The pipelines/ directory contains field mapping configs for different log sources. The tests/ directory holds unit tests and sample logs for validation.
CI/CD pipeline stages
When an analyst commits a new rule to the dev branch, GitLab CI runs four stages:
Stage 1: Syntax validation validate:
script:
- pip install pysigma
- sigma check rules/**/*.yml
This catches YAML syntax errors, missing mandatory fields, invalid MITRE tags.
Stage 2: Conversion test convert:
script:
- sigma convert -t splunk -p sysmon rules/windows/**/*.yml -o converted/splunk/
- sigma convert -t elasticsearch -p ecs_windows rules/windows/**/*.yml -o converted/elastic/
This verifies the rule converts successfully to all target SIEM platforms. If conversion fails, the pipeline stops.
Stage 3: Unit test against sample logs test:
script:
- python tests/test_mimikatz_detection.py
The test script loads sample logs (benign + malicious), runs the converted query, and asserts:
- All known-malicious samples trigger the rule (true positives)
- Zero known-benign samples trigger the rule (false positives)
Stage 4: Deploy to canary SIEM deploy_canary:
script:
- ./scripts/deploy_to_splunk.sh converted/splunk/ canary-siem.internal
only:
- dev
This pushes the rule to a non-production SIEM instance monitoring 5% of traffic. The rule runs for 7 days while analysts monitor alert volume and false positive rate.
Production promotion
After canary validation, the analyst opens a merge request from dev to main. Senior analysts review:
- Detection logic correctness
- False positive documentation completeness
- Performance impact (query execution time)
- Alignment with threat intel
Once approved and merged to main, the production pipeline deploys the rule to all customer SIEM instances within 15 minutes.
Version control and rollback
Every rule change is a Git commit with a descriptive message:
feat(windows): Add detection for CVE-2024-1234 exploit
Detects exploitation of Windows Print Spooler vulnerability
per CERT-In advisory CIAD-2024-0012.
Tested against 50 exploit samples from VirusTotal.
False positive rate: 0.2% (legitimate print jobs to network printers).
MITRE: T1068 (Exploitation for Privilege Escalation)
If a rule causes a false positive storm in production, the on-call analyst runs:
git revert abc123
git push origin main
The CI/CD pipeline automatically removes the problematic rule from all SIEM instances within 5 minutes.
In our HSR Layout lab we simulate this entire workflow—students use GitLab (hosted in our datacenter), write rules in feature branches, submit merge requests, and deploy to our Elastic cluster. By internship time they're already familiar with the tooling and process that Cisco India, HCL, and Barracuda teams use daily.In our HSR Layout lab we maintain a production-grade Elastic stack with 14 days of rolling simulated traffic (benign + MITRE ATT&CK techniques) where students write and test Sigma rules against realistic enterprise logs. During the 4-month paid internship at our Network Security Operations Division partners—Akamai India SOC, HCL Cybersecurity, Barracuda MSS teams—freshers deploy their Sigma rules to production SIEM environments monitoring real customer traffic, with one 2024 batch student's rule catching a Qakbot infection that had evaded legacy detections for 6 days. For teams building telemetry foundations to feed Sigma rules at scale without enterprise SIEM costs, Networkers Home's founder Vikas Swami ships 24Observe — source-available, MIT-licensed uptime, ping, TCP, SSL, and keyword monitoring with AI-assisted anomaly detection at one-tenth the Datadog bill.
Frequently asked questions
Can I write Sigma rules for cloud platforms like AWS CloudTrail or Azure Activity Logs? +
logsource product field. For AWS CloudTrail use product: aws and service: cloudtrail, then reference CloudTrail field names like eventName, userIdentity.principalId, requestParameters. For Azure use product: azure and service: activitylogs with fields like operationName, identity.claims, properties. The SigmaHQ repository contains 200+ cloud detection rules you can use as templates. The main challenge is field name inconsistency—AWS uses sourceIPAddress while Azure uses callerIpAddress for the same concept, so you'll need separate rules or complex field mapping pipelines for multi-cloud environments.How do I test a Sigma rule if I don't have access to a SIEM? +
Get-WinEvent -FilterXml [your query] against exported EVTX files. For JSON logs (Sysmon, CloudTrail), convert to jq syntax and run cat logs.json | jq '[your filter]'. Alternatively, spin up a free Elastic stack in Docker (takes 10 minutes), ingest sample logs via Filebeat, and test your converted Elasticsearch query. In our HSR Layout lab we provide students with a pre-configured Elastic instance containing 14 days of simulated traffic specifically for Sigma rule testing without needing production SIEM access.What's the difference between Sigma rules and YARA rules? +
Why does my Sigma rule work in Splunk but return zero results in Elastic? +
process_name, while Elastic uses Elastic Common Schema (ECS) where it's process.name. When you convert a Sigma rule, you must specify the correct field mapping pipeline with the -p flag: sigma convert -t elasticsearch -p ecs_windows for ECS or sigma convert -t splunk -p splunk_windows for CIM. If you're using a custom log format (non-standard field names), you'll need to create a custom pipeline YAML file that maps Sigma's generic field names to your schema. The SigmaHQ repo includes pipelines for Sysmon, Windows Security logs, Zeek, Suricata, and major cloud providers.How do I handle Sigma rules that detect legitimate admin tools like PsExec or PowerShell? +
filter_legit: ParentImage|endswith: '\\AdminToolkit\\PsExec.exe' AND User|startswith: 'DOMAIN\\admin_'. For PowerShell, filter by script path (signed corporate scripts in C:\\Scripts\\), parent process (launched by SCCM or Intune agents), or command-line patterns (scripts containing your company's internal function names). The key is to make filters specific enough that attackers can't trivially evade them—don't just filter User: 'admin' because attackers create accounts named 'admin'. In our internship program we teach students to collect 30 days of benign admin activity before writing the rule, then craft filters that allow 95%+ of legitimate use while still catching malicious abuse.