Commands

Commands in DataPrime serve as the fundamental building blocks for performing various operations on your data. They allow for a wide range of data manipulations and transformations, helping you refine, structure, and analyze your data to meet specific requirements. These operations can be composed together to continuously transform the data until the desired results are achieved.

Key types of operations

Data/Structure Manipulations: Modify or adjust the structure of your dataset to suit your analysis needs.
Filtering and Searching: Refine your data by including or excluding specific entries based on defined criteria.
Aggregations: Summarize your data by calculating metrics such as sums, averages, counts, etc.
Data Selection/Projection: Choose specific fields or columns to focus on from your dataset.
Sorting: Order your data in a specific manner, based on defined criteria like ascending or descending order.
Joins/Unions: Combine data from different sources or datasets, merging or linking them based on common keys.
Datatype Conversions: Convert data from one type to another, ensuring compatibility for further processing.
Data Extraction (semi-structured to structured): Transform semi-structured data into a fully structured format, making it easier to work with and analyze.

Please refer to the Command Language Reference for a detailed list of available commands and their use cases.

Example command queries

Commands offer an extremely varied set of functionality that can be turned to any problem.

Simplistic filtering using `filter`

Consider the following documents:

{ "name": "john", "age": 48 , "country": "il" }
{ "name": "jane", "age": 20 }
{ "name": "sophia", "age": 70 , "country": "us", "city": "San Francisco" }
{ "name": "chris", "age": 30 , "country": "uk", "city": "Manchester" }

This document represents names, ages, countries and cities of some user. We can filter this data in a number of different ways. For example, by age:

source logs | filter age < 30

This will result in the following document, because both sophia and john are over the age of 30, and chris is 30.

{ "name": "jane", "age": 20 }

Filtering can be done on any expression that returns a boolean value, meaning much more complex calculations can be used as the predicate for filtering.

Creating a new field using `create`

Consider the following documents:

{ "name": "john", "age": 48 , "country": "il" }
{ "name": "jane", "age": 20 }
{ "name": "sophia", "age": 70 , "country": "us", "city": "San Francisco" }

Assume we have a use case, where we need to visualize the age of each individual in days. We can perform a crude calculation to approximate this, by multiplying age by 365. Doing this is simple:

source logs | create age_days from age * 365

This will result in the following documents:

{ "name": "john", "age": 48, "age_days": 17520, "country": "il" }
{ "name": "jane", "age": 20, "age_days": 7300 }
{ "name": "sophia", "age": 70, "age_days": 25550, "country": "us", "city": "San Francisco" }

Redacting sensitive information using `redact`

Consider the following documents:

{ "name": "john", "msg": "John's email is john@acme.com"}
{ "name": "jane", "msg": "Jane's email is jane@acme.com"}
{ "name": "sophia", "msg": "Sophia's email is sophia@acme.com"}

If we wish to redact the email from the msg fields, we can do this using the redact command:

source logs | redact msg matching /[a-z0-9A-Z]+@acme.com/ to 'REDACTED'

This will result in the following documents:

{ "name": "john", "msg": "John's email is REDACTED"}
{ "name": "jane", "msg": "Jane's email is REDACTED"}
{ "name": "sophia", "msg": "Sophia's email is REDACTED"}