Week 9 Notes - Course Introduction
Published
November 10, 2025
Key Takeaways: Space-Time Prediction (Bike Share Demand)
1. Motivation and Problem Context
- Real-world task: forecast bike share demand to support rebalancing.
- Demand varies across space (stations) and time (hours).
- Operational need: predict which stations will be empty or full ahead of time.
- Models must capture temporal patterns and spatial differences simultaneously.
2. Panel Data Framework
- Definition: same units observed repeatedly over time.
- Each row = station-hour combination.
- Enables analysis of both:
- Station-specific baselines (spatial effects).
- Time-based dynamics (temporal effects).
Examples of temporal variation: - Morning/evening commute peaks. - Weather and weekday effects. - Weekend and holiday differences.
3. Data Preparation
Binning Trips
- Aggregate trips into consistent time intervals (e.g., hourly).
- Purpose: identify temporal patterns and create comparable time steps.
Creating Features
- Extract from timestamps:
hour,day of week,week.
- Label weekends and holidays.
- Merge weather data (temperature, precipitation).
4. Temporal Lags
- Past demand predicts future demand.
- Common lag variables:
lag1Hour,lag3Hours,lag12Hours,lag1day.
- Capture persistence and daily cycles.
- Must compute within each station.
5. Building a Complete Panel
- Missing station-hour combinations break lag calculations.
- Use
expand.grid()to create all station-hour pairs. - Fill missing trip counts with zeros.
- Join:
- Fixed attributes (station location, demographics).
- Time-varying features (weather, hour, weekday).
6. Temporal Validation
- Rule: never train on the future to predict the past.
- Split by time, not randomly.
- Train on early weeks → test on later weeks.
- Mirrors operational forecasting: use past data to predict future behavior.
- Prevents temporal leakage.
7. Model Development Progression
- Baseline: time + weather.
- + Temporal lags: add past demand variables.
- + Spatial features: add demographics.
- + Fixed effects: control for station-level baselines.
- + Holidays: capture disruptions in demand.
Performance metric: MAE (Mean Absolute Error)
→ Interpretable as “average trips off by X.”
8. Typical Results
| Model | Key Additions | Improvement |
|---|---|---|
| 1. Baseline | Time + Weather | Base level |
| 2. + Lags | Temporal persistence | Major gain |
| 3. + Demographics | Neighborhood context | Moderate |
| 4. + Fixed Effects | Station baselines | Large gain |
| 5. + Holidays | Event adjustments | Small gain |
9. Space-Time Error Analysis
- Examine residuals in both space and time.
- Identify systematic under/over-prediction.
- Rush hours or weekends?
- Certain neighborhoods?
- Map errors spatially to reveal missing features.
- Check correlation between errors and demographics (equity concerns).
10. Policy Insights
- High-volume stations: prediction accuracy most critical.
- Temporal trends guide rebalancing schedules.
- Persistent spatial errors may imply infrastructure or equity issues.
- “Good enough” predictions depend on operational tolerance for error.
11. Broader Applications
- Panel data + temporal lags apply to:
- Transit ridership
- Crime monitoring
- Housing dynamics
- Health and education outcomes
- Environmental monitoring
12. Core Concepts Recap
- Panel data tracks the same spatial units over time.
- Temporal lags encode short- and long-term persistence.
- Temporal validation ensures realistic performance testing.
- Station fixed effects handle unobserved spatial heterogeneity.
- Space-time error analysis diagnoses model bias.
- Equity assessment is essential for deployment decisions.