summarize Operator
The summarize
operator aggregates data with optional grouping, windowing, and emit clauses.
Syntax
| summarize { <aggregations> } [by <grouping>] [over <window>]
| summarize { <aggregations> } [by <grouping>] emit <emit_clause>
Description
This operator performs data aggregation operations, grouping documents by specified fields and applying aggregation functions. The implementation supports time-based and count-based windowing along with various emit strategies for controlling output timing.
Parameters
Aggregations
{ <aggregation_mappings> }
Aggregation Mappings:
field: function(expression)
- Apply aggregation function to expressionfield
- Shorthand forfield: function(field)
...*
- Spread all fields from source...expression
- Spread fields from expression result-field
- Exclude field from output
Grouping
by field1, field2, ...
Groups documents by the specified fields before aggregation.
Windowing
over <window_definition>
Window Types:
hopping_window(interval, hop)
- Hopping time windowtumbling_window(interval)
- Tumbling time windowsliding_window(count)
- Sliding count windowsession_window(timeout)
- Session-based window
Emit Clauses
emit <emit_strategy>
Emit Strategies:
every <interval> [using <expression>]
- Emit at regular intervalswhen <condition>
- Emit when condition is trueon change <expression>
- Emit when expression value changeson group change
- Emit when group changeson update
- Emit on every update
Examples
Basic Aggregation
| summarize { count: count(), avg_temp: avg(temperature) }
| summarize { count(), max(value), min(value) }
| summarize { total: sum(amount), count: count() }
Grouped Aggregation
| summarize { count: count(), avg_temp: avg(temperature) } by sensor_id
| summarize { total_sales: sum(amount) } by product_id, region
| summarize { error_count: count() } by error_type, severity
Field Spreading and Exclusion
| summarize { ...*, count: count() } by user_id
| summarize { ...*, -internal_field, total: sum(value) } by category
| summarize { ...user, login_count: count() } by user.id
Time-Based Windowing
| summarize { count: count() }
over hopping_window(5m, 1m)
| summarize { avg_temp: avg(temperature) }
over tumbling_window(1h)
| summarize { max_value: max(value) }
over session_window(30m)
Count-Based Windowing
| summarize { avg_value: avg(value) }
over sliding_window(100)
Emit Strategies
| summarize { count: count() }
emit every 1m using count()
| summarize { max_temp: max(temperature) }
emit when max_temp > 30
| summarize { current_count: count() }
emit on change current_count
| summarize { group_total: sum(value) }
by category
emit on group change
Complex Aggregations
| summarize {
count: count(),
unique_users: count_distinct(user_id),
total_revenue: sum(amount),
avg_order: avg(amount),
max_order: max(amount),
min_order: min(amount)
} by product_category
Multi-Level Grouping
| summarize {
daily_sales: sum(amount),
order_count: count()
} by date, region, product_type
Conditional Aggregations
| summarize {
total_errors: count(),
critical_errors: count_if(severity = "critical"),
warning_ratio: count_if(severity = "warning") / count()
} by service_name
Advanced Window Patterns
| summarize {
moving_avg: avg(temperature)
} over hopping_window(10m, 2m)
by sensor_id
| summarize {
session_duration: max(timestamp) - min(timestamp)
} over session_window(5m)
by user_id
Emit with Complex Logic
| summarize {
error_rate: count_if(status = "error") / count()
} by service_name
emit when error_rate > 0.1
| summarize {
current_users: count_distinct(user_id)
} by minute
emit every 1m using current_users
Aggregation Functions
Basic Functions
count()
- Count of documentscount_distinct(field)
- Count of unique valuessum(field)
- Sum of valuesavg(field)
- Average of valuesmin(field)
- Minimum valuemax(field)
- Maximum value
Statistical Functions
stddev(field)
- Standard deviationvariance(field)
- Variancepercentile(field, p)
- Percentile valuemedian(field)
- Median value
Conditional Functions
count_if(condition)
- Count documents matching conditionsum_if(field, condition)
- Sum values matching conditionavg_if(field, condition)
- Average values matching condition
Performance Considerations
- Grouping by many fields can impact performance
- Time-based windows require timestamp ordering
- Complex emit conditions evaluate frequently
- Consider appropriate window sizes for your use case