Week 9 Notes - Course Introduction

Published

November 10, 2025

Key Takeaways: Space-Time Prediction (Bike Share Demand)

1. Motivation and Problem Context

Real-world task: forecast bike share demand to support rebalancing.
Demand varies across space (stations) and time (hours).
Operational need: predict which stations will be empty or full ahead of time.
Models must capture temporal patterns and spatial differences simultaneously.

2. Panel Data Framework

Definition: same units observed repeatedly over time.
Each row = station-hour combination.
Enables analysis of both:
- Station-specific baselines (spatial effects).
- Time-based dynamics (temporal effects).

Examples of temporal variation: - Morning/evening commute peaks. - Weather and weekday effects. - Weekend and holiday differences.

3. Data Preparation

Binning Trips

Aggregate trips into consistent time intervals (e.g., hourly).
Purpose: identify temporal patterns and create comparable time steps.

Creating Features

Extract from timestamps:
- hour, day of week, week.
Label weekends and holidays.
Merge weather data (temperature, precipitation).

4. Temporal Lags

Past demand predicts future demand.
Common lag variables:
- lag1Hour, lag3Hours, lag12Hours, lag1day.
Capture persistence and daily cycles.
Must compute within each station.

5. Building a Complete Panel

Missing station-hour combinations break lag calculations.
Use expand.grid() to create all station-hour pairs.
Fill missing trip counts with zeros.
Join:
- Fixed attributes (station location, demographics).
- Time-varying features (weather, hour, weekday).

6. Temporal Validation

Rule: never train on the future to predict the past.
Split by time, not randomly.
- Train on early weeks → test on later weeks.
Mirrors operational forecasting: use past data to predict future behavior.
Prevents temporal leakage.

7. Model Development Progression

Baseline: time + weather.
+ Temporal lags: add past demand variables.
+ Spatial features: add demographics.
+ Fixed effects: control for station-level baselines.
+ Holidays: capture disruptions in demand.

Performance metric: MAE (Mean Absolute Error)
→ Interpretable as “average trips off by X.”

8. Typical Results

Model	Key Additions	Improvement
1. Baseline	Time + Weather	Base level
2. + Lags	Temporal persistence	Major gain
3. + Demographics	Neighborhood context	Moderate
4. + Fixed Effects	Station baselines	Large gain
5. + Holidays	Event adjustments	Small gain

9. Space-Time Error Analysis

Examine residuals in both space and time.
Identify systematic under/over-prediction.
- Rush hours or weekends?
- Certain neighborhoods?
Map errors spatially to reveal missing features.
Check correlation between errors and demographics (equity concerns).

10. Policy Insights

High-volume stations: prediction accuracy most critical.
Temporal trends guide rebalancing schedules.
Persistent spatial errors may imply infrastructure or equity issues.
“Good enough” predictions depend on operational tolerance for error.

11. Broader Applications

Panel data + temporal lags apply to:
- Transit ridership
- Crime monitoring
- Housing dynamics
- Health and education outcomes
- Environmental monitoring

12. Core Concepts Recap

Panel data tracks the same spatial units over time.
Temporal lags encode short- and long-term persistence.
Temporal validation ensures realistic performance testing.
Station fixed effects handle unobserved spatial heterogeneity.
Space-time error analysis diagnoses model bias.
Equity assessment is essential for deployment decisions.