PROGRAM | Security Analytics | 2019 -> 2022
Security Ops: User Behavior Analytics & Anomaly Detection
Large-scale telemetry -> behavioral baselines -> explainable anomaly signals for SOC workflows.
PySparkPythonUnsupervised learningFeature engineeringLog pipelines
- Processed enterprise telemetry streams (directory, proxy, endpoint) at scale.
- Unsupervised detection to surface meaningful anomalies with controllable false positives.
- Analyst-in-the-loop iteration and methodology docs to keep the system auditable.
Context
- Security teams need behavioral signals they can trust, not black-box alerts.
- Data volume is high; latency and cost constraints matter.
- Explainability is mandatory: an alert must be debuggable.
What we built
- Ingestion and normalization pipelines for multi-source logs.
- Feature extraction for user/session/device baselines and temporal patterns.
- Unsupervised scoring and thresholds designed for SOC triage and investigation.
Engineering choices
- Spark-first design to keep throughput stable under growth.
- Signals designed to degrade gracefully when data is sparse or partially missing.
- Documentation treated as a deliverable (how signals behave, why they trigger).
RELATED