Sessionization Guide


Quick Reference

Analysis TypeBusiness QuestionCore TechniqueComplexity
Session splittingHow do you group continuous events into discrete sessions?LAG + TIMESTAMPDIFF + SUM⭐⭐⭐
Session aggregationWhat are the average session duration and page views?MIN/MAX + COUNT + GROUP BY⭐⭐
Bounce rateHow many users left after viewing only one page?CASE WHEN COUNT = 1⭐⭐
Path analysisWhich pages did users visit within a session?GROUP_CONCAT + ORDER BY⭐⭐
Conversion analysisHow many sessions completed a target action?MAX(CASE WHEN event_type = ...)⭐⭐

Sample Data

All examples in this guide are based on the following user behavior event stream:

-- User behavior events table CREATE TABLE user_events ( user_id BIGINT, event_time TIMESTAMP_LTZ, page VARCHAR, event_type VARCHAR -- view, add_to_cart, complete, etc. );

The sample data simulates 3 users with 18 events total, covering 5 sessions (including bounce sessions and conversion sessions).


1. Session Splitting (Core)

Session Definition

  • Session timeout: If a user is inactive for 30 minutes, the next activity starts a new session.
  • Session ID: Increments from 1 within each user partition.

Four-Step Splitting Method

WITH events AS ( SELECT user_id, CAST(event_time AS TIMESTAMP_LTZ) AS event_time, page, event_type FROM user_events ), -- Step 1: Retrieve the previous event time with_prev AS ( SELECT user_id, event_time, page, event_type, LAG(event_time) OVER (PARTITION BY user_id ORDER BY event_time) AS prev_event_time FROM events ), -- Step 2: Calculate the time gap and flag new session starts with_gap AS ( SELECT *, CASE WHEN prev_event_time IS NULL THEN 1 -- First event, new session WHEN TIMESTAMPDIFF(MINUTE, prev_event_time, event_time) > 30 THEN 1 -- Timeout, new session ELSE 0 END AS is_new_session FROM with_prev ), -- Step 3: Accumulate flags to generate session IDs with_session AS ( SELECT *, SUM(is_new_session) OVER (PARTITION BY user_id ORDER BY event_time) AS session_id FROM with_gap ) SELECT user_id, session_id, event_time, page, event_type FROM with_session ORDER BY user_id, session_id, event_time;

Sample output:

user_idsession_idevent_timepageevent_type
112026-03-01 10:00:00homeview
112026-03-01 10:05:00productsview
112026-03-01 10:15:00product_detailview
122026-03-01 14:00:00homeview
122026-03-01 14:10:00cartview
212026-03-01 09:00:00landingview
222026-03-01 11:00:00homeview

Key Notes

  • TIMESTAMPDIFF(MINUTE, start, end) returns the difference in minutes between two timestamps.
  • DATEDIFF on TIMESTAMP_LTZ columns returns the difference in days (not minutes) by default — always use TIMESTAMPDIFF for sub-day precision.
  • The session timeout threshold can be adjusted to fit your business: 30 minutes is common for e-commerce; 15 minutes works well for content platforms.

2. Session Aggregation Metrics

SQL Implementation

-- Aggregate on top of the session-splitting result SELECT user_id, session_id, MIN(event_time) AS session_start, MAX(event_time) AS session_end, TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) AS session_duration_min, COUNT(*) AS page_views, COUNT(DISTINCT page) AS unique_pages, -- Bounce session: only 1 page view CASE WHEN COUNT(*) = 1 THEN 1 ELSE 0 END AS is_bounce, -- Conversion session: contains a target event MAX(CASE WHEN event_type IN ('add_to_cart', 'complete') THEN 1 ELSE 0 END) AS is_converted FROM with_session GROUP BY user_id, session_id ORDER BY user_id, session_id;

Sample output:

user_idsession_idsession_startsession_endduration_minpage_viewsis_bounceis_converted
112026-03-01 10:00:002026-03-01 10:15:0015300
122026-03-01 14:00:002026-03-01 14:10:0010200
212026-03-01 09:00:002026-03-01 09:00:000110
222026-03-01 11:00:002026-03-01 11:12:0012501
312026-03-01 08:00:002026-03-01 08:30:0030701

3. Overall Session Statistics

SQL Implementation

SELECT COUNT(*) AS total_sessions, COUNT(DISTINCT user_id) AS unique_users, ROUND(AVG(page_views), 1) AS avg_pages_per_session, ROUND(AVG(session_duration_min), 1) AS avg_duration_min, ROUND(SUM(CASE WHEN page_views = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS bounce_rate_pct, ROUND(SUM(is_converted) * 100.0 / COUNT(*), 1) AS conversion_rate_pct FROM session_agg;

Sample output:

total_sessionsunique_usersavg_pagesavg_duration_minbounce_rateconversion_rate
533.613.420.040.0

4. User Path Analysis

SQL Implementation

SELECT user_id, session_id, GROUP_CONCAT(page ORDER BY event_time SEPARATOR ' → ') AS page_path, COUNT(*) AS path_length FROM with_session GROUP BY user_id, session_id ORDER BY user_id, session_id;

Sample output:

user_idsession_idpage_pathpath_length
11home → products → product_detail3
12home → cart2
21landing1
22home → products → product_detail → cart → checkout5
31home → products → product_detail → reviews → cart → checkout7

Top Path Patterns

-- Find the 5 most common paths SELECT page_path, COUNT(*) AS session_count, COUNT(DISTINCT user_id) AS unique_users FROM ( SELECT user_id, session_id, GROUP_CONCAT(page ORDER BY event_time SEPARATOR ' → ') AS page_path FROM with_session GROUP BY user_id, session_id ) GROUP BY page_path ORDER BY session_count DESC LIMIT 5;


5. Session Quality Segmentation

Segmentation Criteria

TierPage ViewsDurationDefinition
Bounce10 minLeft after viewing a single page
Shallow2–3< 5 minQuick browse
Medium4–65–15 minModerate engagement
Deep7+> 15 minHigh engagement

SQL Implementation

SELECT user_id, session_id, page_views, session_duration_min, CASE WHEN page_views = 1 THEN 'bounce' WHEN page_views <= 3 AND session_duration_min < 5 THEN 'shallow' WHEN page_views <= 6 AND session_duration_min < 15 THEN 'medium' ELSE 'deep' END AS session_quality FROM session_agg ORDER BY user_id, session_id;

Quality Distribution

SELECT session_quality, COUNT(*) AS session_count, ROUND(COUNT(*) * 100.0 / SUM(COUNT(*)) OVER (), 1) AS pct FROM ( SELECT CASE WHEN page_views = 1 THEN 'bounce' WHEN page_views <= 3 AND session_duration_min < 5 THEN 'shallow' WHEN page_views <= 6 AND session_duration_min < 15 THEN 'medium' ELSE 'deep' END AS session_quality FROM session_agg ) GROUP BY session_quality ORDER BY session_count DESC;


6. Advanced Scenarios

Scenario 1: Session Quality by Traffic Source

SELECT u.traffic_source, COUNT(DISTINCT s.session_id) AS sessions, ROUND(AVG(s.page_views), 1) AS avg_pages, ROUND(SUM(CASE WHEN s.page_views = 1 THEN 1 ELSE 0 END) * 100.0 / COUNT(*), 1) AS bounce_rate FROM session_agg s JOIN users u ON s.user_id = u.user_id GROUP BY u.traffic_source ORDER BY bounce_rate;

Scenario 2: Step Numbering Within a Session

-- Label each event's position within its session (step 1, step 2, ...) SELECT user_id, session_id, event_time, page, ROW_NUMBER() OVER (PARTITION BY user_id, session_id ORDER BY event_time) AS step_number, CASE WHEN event_type = 'complete' THEN 'conversion' WHEN event_type = 'add_to_cart' THEN 'intent' ELSE 'browse' END AS event_category FROM with_session ORDER BY user_id, session_id, step_number;

Scenario 3: Cross-Session User Behavior Analysis

-- Calculate total sessions and average gap between sessions per user WITH session_times AS ( SELECT user_id, session_id, MIN(event_time) AS session_start FROM with_session GROUP BY user_id, session_id ), session_gaps AS ( SELECT user_id, session_id, session_start, LAG(session_start) OVER (PARTITION BY user_id ORDER BY session_start) AS prev_session_start, TIMESTAMPDIFF(HOUR, LAG(session_start) OVER (PARTITION BY user_id ORDER BY session_start), session_start) AS hours_between_sessions FROM session_times ) SELECT user_id, COUNT(*) AS total_sessions, ROUND(AVG(hours_between_sessions), 1) AS avg_hours_between_sessions FROM session_gaps WHERE hours_between_sessions IS NOT NULL GROUP BY user_id;


Common Issues

1. DATEDIFF vs TIMESTAMPDIFF

-- Wrong: DATEDIFF on TIMESTAMP_LTZ returns the difference in days (events minutes apart show 0) DATEDIFF(event_time, prev_event_time) -- returns 0 -- Correct: use TIMESTAMPDIFF with an explicit unit TIMESTAMPDIFF(MINUTE, prev_event_time, event_time) -- returns 5 TIMESTAMPDIFF(SECOND, prev_event_time, event_time) -- returns 300 TIMESTAMPDIFF(HOUR, prev_event_time, event_time) -- returns 0

2. Choosing a Session Timeout Threshold

Business TypeRecommended ThresholdReason
E-commerce30 minutesUsers may leave to compare prices
Content / News15 minutesUsers leave after finishing an article
SaaS tools60 minutesUsers may work for extended periods
Gaming10 minutesHigh activity frequency, short gaps

3. Session Duration for Single-Page Sessions

-- Bounce sessions have session_start = session_end, so duration is 0. -- This is correct. Do not use the gap between adjacent events as session duration. -- Always use MIN/MAX of event_time within the session. TIMESTAMPDIFF(MINUTE, MIN(event_time), MAX(event_time)) -- correct

4. GROUP_CONCAT Ordering

-- Wrong: without ORDER BY, path order is non-deterministic GROUP_CONCAT(page SEPARATOR ' → ') -- Correct: order by event_time to guarantee path sequence GROUP_CONCAT(page ORDER BY event_time SEPARATOR ' → ')

5. Performance at Scale

-- Optimization: filter by date partition first, then split sessions WITH filtered_events AS ( SELECT * FROM user_events WHERE event_time >= '2026-03-01' AND event_time < '2026-03-02' ), -- Remaining session-splitting logic is unchanged...


Performance Optimization Tips

ScenarioStrategy
Large-scale session splittingProcess day by day to avoid full-table window computations
High-frequency queriesMaterialize session aggregation results as a Dynamic Table
Path analysisBuild a Bloom Filter index on (user_id, event_time)
Cross-midnight sessionsUse DATE_TRUNC('DAY', event_time) as an additional partition key

-- Recommended: encapsulate session-splitting logic in a Dynamic Table CREATE DYNAMIC TABLE dws_user_sessions REFRESH INTERVAL 1 HOUR AS WITH with_prev AS (...), with_gap AS (...), with_session AS (...) SELECT user_id, session_id, event_time, page, event_type FROM with_session;