Skip to content

How to use DataPrime to isolate and shape logs for deeper analysis

Goal

By the end of this guide, you should be able to use filter, block, choose, and create to isolate relevant log data, transform it, and prepare it for further analysis.

Why it matters

When debugging issues or investigating anomalies, you’ll rarely get what you need from a single filter. Real investigations require peeling back layers: filtering what matters, cutting what doesn’t, shaping the remaining data, and adding context for further questions. This guide shows you how to combine multiple DataPrime commands into a focused, intermediate-level workflow.


Filter logs based on key conditions

Use filter to include only logs that match your criteria. This is your first pass—tightening the lens to only look at relevant documents.

filter ipInSubnet(ip_address, '10.8.0.0/16')

This example keeps only logs where the ip_address belongs to a private subnet. Under the hood, ipInSubnet checks whether a string-form IP falls within a given CIDR range. It's especially useful when you're interested in internal service communication or suspicious traffic in reserved blocks.

Filters are boolean expressions—if the result is true, the log stays. You can combine conditions (&&, ||) or nest them to get precise.


Use block to remove unwanted noise

Where filter includes logs that match, block excludes them. This is your second pass: trim what’s common or unhelpful.

block status_code.startsWith('2')

This removes logs with 2xx status codes—usually successful requests—so you can focus on errors or edge cases. block is great for ignoring heartbeat events, noisy health checks, or other “happy path” scenarios that dilute your dataset.

Internally, it works by evaluating a boolean expression per log and discarding any for which the expression is true.


Use choose to reduce and standardize the shape

Most logs have far more fields than you need. Use choose to extract just the fields you care about—and rename or transform them in the process.

choose firstNonNull(user_id, userId, user_identifier) as canonical_user_id, path, status_code

Here, choose keeps only a minimal set of fields: a standardized user ID, the request path, and the status. If your logs come from multiple sources with inconsistent naming, firstNonNull helps you pick the first non-null variant and project it as a unified field.

This is especially helpful before exporting logs or aggregating—they’ll be smaller, faster to analyze, and easier to visualize.


Use create to add computed or contextual fields

Once your data is clean and consistent, you can add derived fields to enrich it. create lets you generate new fields based on expressions, lookups, or constants.

create is_internal from ipInSubnet(ip_address, '10.0.0.0/8')

This adds a boolean field that marks whether a log came from internal IP space. You can later filter or group by this field without repeating the expression.

You can also use create to tag logs with constants or randomly generated values, like so:

create analysis_batch_id from randomUuid()

That’s useful when preparing logs for downstream export, tagging batches, or linking records during an investigation.


Putting it all together

Each of these commands does one thing well:

  • filter gets you the logs you care about
  • block removes the ones you don’t
  • choose gives you a clean, minimal shape
  • create adds context, calculations, or structure

Here’s how they come together in a full workflow:

filter ipInSubnet(ip_address, '10.8.0.0/16')
| block status_code.startsWith('2')
| choose firstNonNull(user_id, userId, user_identifier) as canonical_user_id, path, status_code
| create is_internal from ipInSubnet(ip_address, '10.0.0.0/8')
| create analysis_batch_id from randomUuid()

This query gives you just the logs you need—internal traffic with errors—reduced to essential fields, tagged with metadata to track your analysis. It’s fast, expressive, and purpose-built.


Expected output

Your logs should now look something like this:

{
  "canonical_user_id": "dave-123",
  "path": "/api/checkout",
  "status_code": "500",
  "is_internal": true,
  "analysis_batch_id": "f08a7a5e-83b7-42bd-9a1c-1098441a4c6a"
}

That’s a tight, structured document—ideal for grouping, exporting, or visualizing.