3 components: spatial=where, temporal=when, data=what. Temporal capture 2D space & 3rd dim. captures time • fMRI data: 4D array w/ 3 check if model is suitable, randomness, outliers) • Future/forecast time:
data (TD): (1) Sequence of observations recorded at time intervals dim. capturing 3D space & 4th dim. captures time. Clustering ST t + n Time features: • Measure how 2 variables change together?
(typically regular time intervals) (2) Array of real numbers of size D x Event Data: Cluster=collection of data objects; unsupervised learning (1) Correlation (/ - \) (2) Autocorrelation (mountain). Autocorrelation
N; D= # of dimensions & N= # of samples Spatial data (SD): (1) Non- (no labelled data). Clustering algorithms: (1) Partitioning: construct function (ACF): vary the value of k. • Seasonal: seasonal pattern
spatial (NS) information: • Same as data in traditional data processing: various partitions & appears when time series is affected by seasonal factors (seasonality
numerical, categorical, ordinal, Boolean (city name, city pop) (2) evaluate them by criterion has known fixed time period); repeating short-term cycle • Trend: long
Spatial information: • Spatial attribute: (k-means) (2) Density-(DB) term increase/decrease in data (doesn’t need to be linear) • Cyclic:
geographically referenced 1. NBHD & extent based: based on connectivity rise & fall pattern that doesn’t have fixed frequency (duration of these
2. Location: longitude, latitude, elevation & density functions. Grow fluctuations is usually at least 2 years).
• SD representations: 1. Raster: gridded space cluster as long as density in
2. Vector (geometric): point, line, polygon NBHD exceeds threshold
3. Graph: node, edge, path. Space AND time: (DBSCAN).
• Real-world problems, not enough to consider K-Means (KM): • Separate samples in n-groups of equal variance
1. Just snapshots of a spatial process at a given • Requires # of clusters to be specified. 1. Choose # of clusters 2.
time, 2. Or time series at a spatial location Randomly choose initial positions of centroids 3. Assign each of points
• The behavior from one time/spatial point to the next is important. to nearest centroid (depending on dist. measure) 4. Recompute centroid
Why ST data? No history without geography (Toblar’s Law). Many positions 5. If solution convergesàStop, else go to step 3. Objective AR integrated MA (ARIMA) model: • Based on assumption that
historical processes are dynamic, showing changes of spatial patterns function for KM: • Finds local min. of minimizing squared distances • underlying time series is stationarity or can be made by differencing it
through time: • Growth & decline of populations • Epidemics New data points can be assigned cluster membership based on existing one/more times • Combines differencing, with AR and MA models.
• Migrations • Housing prices. Standard Data Analysis assumes clusters. Restriction of cluster shapes: • Clusters = Voronoi-diagrams of ARIMA(p,d,q): p=order of the AR, q=order of the MA, d=order of the
statistical independence (SI): (1) Assume each observed data point is centers • Clusters always convex in space. Limitations of KM: • Only differencing (typically 1) • For MA(q) pick q by looking at ACF at
SI from observation that preceded it (2) Classification: class of data simple cluster shapes • Boundaries determined from centers • Can’t different lags. • AR(p) uses partial AC function (PACF): measures
point x(k) is not influenced by class of x(k-1) or any other data point model covariances well in anisotropically Additive/multiplicative model: • Additive: f(t) = St + Tt + Rt relationship between y(t) and y(t-k) after removing the effects of lags
(3) Prediction: value of specific field/point on a grid does not depend distributed clusters. DB clustering (St=seasonal component, Tt=trend-cycle, Rt=remainder). {1,2,3,…,k-1}. PACF is last coefficient !(k) in an AR(k) model.
on other fields/neighboring points (4) Many important real-world methods: • Doesn’t require user to set # Multiplicative: If the variations in St or Tt are proportional to t: f(t) = St • Using max likelihood estimator (MLE): find parameters that
problems do not satisfy this SI assumption. Two generic properties of of clusters a priori • Based on density * Tt * Rt. Convert into additive model: take the log, log f(t) = log St + maximize the probability of obtaining the observed data • ((t) depends
ST data: (1) Autocorrelation (AC) = not (statistically) independent: (local cluster criterion), such as density- log Tt + log Rt. Moving avg. smoothing (compute trend of data): Avg. on previous errors & previous observations (so model is not linear).
• Observations made at nearby locations and time stamps are connected points or based on an explicitly values of time series within k-periods of time; Compute seasonal Box Jenkins modeling approach: Vector AR (VAR):
independent, but correlated with each other • Coherence of spatial constructed density function.• Major features: (1) Discover clusters of component: simply avg. de-trended values for that season. Forecasting • Exogenous variables aren’t
observations (surface temperature values are consistent at nearby arbitrary shape (2) Handle noise (3) One scan (4) Need density params time series: (1) Last observed value (2) Avg. of observed values explained by other vars within
locations) • Smoothness in temporal observations (changes in traffic DBSCAN = DB spatial clustering of applications with noise: (equally important) (3) Drift of values. Regression models: • Linear model • Endogenous vars are
activity occurs smoothly over time). (2) Heterogeneity: • ST datasets • Density: # of sample points within specified radius (epsilon) • Core relation between time series: y(t) and some measurements over time explained by other vars
can show heterogeneity = non-stationary in both space & time (satellite point: sample with more than specified # of points (MinPts) within x(1,t), x(2,t)… (time series) - US consumption & personal income. • These vars
measurements of vegetation at location on Earth shows cyclical pattern epsilon • Border point: fewer than MinPts within • Linear regression model: y = x! influence eo’s
in time due to presence of seasonal cycles - winter vs summer ST data epsilon but is in NBHD of core point • Noise predictions • For K vars
types: (1) Event (points): comprise of discrete events occurring at point point: any point that is not a core point and p lags: K + pK^2
locations & times (incidences of crime events in city) (2) Trajectory: or a border point. DBSCAN: • Allows learnable coefficients. 2+1*2^2.
trajectory of moving bodies being measured (3) Lattice: polygons (4) complex cluster shapes • Can detect A=Time series w/ seasonal component
Point reference (PR): continuous ST field with different measuring B=Stationary time series C=White
outliers • Needs 2 parameters to adjust,
points (measurements of surface temperature collected using weather noise D=Time series w/ trend
epsilon is hard to pick (can be done
component. AR(p)=0: no AC.
based on the number of clusters)
AR(p): • ACF tails off gradually • P
• Can learn arbitrary cluster shapes
cuts off after lag p | MA(q): • A cuts off
• Finds core samples of high density &
expands clusters from them • Sample is “core sample” if more than after lag q • PACF tails off gradually.
• Regression models: least squares – Numerically evaluate how good a L3 Data Splitting and Time Series Classification
balloons) (5) Raster: MinPts is within epsilon-“dense region” • Steps: 1. Start with a core model is? Squared errors (SSE) (find ! to minimize squared errors): Regression vs. forecasting: • Regression: (1) Inputs are independent &
fixed measuring sample 2. Recursively find neighbors that are core-samples and add to • Regression models: forecasts – With estimate of !" on observed data, identically distributed (2) No input ordering (random splits) (3) Each
points with fixed ST cluster. 3. Add samples within epsilon that are not core samples (but forecast on unseen data (predict distribution). Confidence intervals for
don’t recurse) 4. If no other points are “reachable”, pick another core input has 1 output. • Forecasting: (1) Inputs are dependent (2) Input
grid (fMRI scans of Normal distribution: • Prediction interval: interval where we expect ordering (splits wrt order) (3) 1 input may have more outputs: fh (4)
brain activity). sample, start new cluster 5. Remaining points are labeled outliers. forecasted value to lie, with a desired confidence. Forecasting time
• Limitations: (1) Varying densities (2) High-dimensional data. ST- Use previous values. Bias-variance trade-off: • Split data D into K
Converting between data types: • Vector to Raster: aggregate counts series - Moving avg. smoothing: Avg. of previous k values. independent datasets: {D(1),…,D(K)} • On each dataset fit a model by
of events at every cell of a ST grid DBSCAN: • Clusters ST data according to its NS, spatial, & temporal Exponential moving avg. smoothing: Attach larger weights to the most
attributes • Assigns density factor to each cluster • Compare avg. value minimizing the error • Bias=extent to which the avg. prediction over all
• Raster to Vector: event extraction • PR recent observations & lower to distant observations (0 < & < 1).
of cluster with new coming value • To address issue of different border datasets differs from the desired regression function • Variance=extent
to Raster: interpolate or aggregate counts • Add offset • Estimate & by minimizing SSE. • State equation (state- to which solutions for individual datasets vary around avg. prediction
• Raster to PR: take every vertex of the object values if clusters are adjacent to each other & little difference space models): l(t) = l(t-1) + &((t). Stationary vs non-stationary:
between NS values of neighbor objects in each cluster. Evaluate (measures extent to which model is sensitive to particular choice of
grid as a ST reference point. Points: (1) Stationary: its statistical properties don’t depend on time (2) Non-
clustering result: (1) Elbow plot: • Compute sum of squared distance dataset) • Simple models: high bias, low variance (underfitting)
Tuple contains spatial & temporal info of stationary: trend & seasonality affect value of time series at different
(SSE) between data points and their assigned clusters’ centroids • Pick • Complex models: low bias, high variance (overfitting). Time series
discrete observation • Metric available times. Converting non-stationary to stationary: • Logarithms can
desired # of clusters at the spot where SSE starts to flatten out and form data splitting: • Walk-forward validation where a model is updated at
that captures strength of interaction (AC) help stabilize variance of time series. • Differencing can help stabilize
an elbow (2) Silhouette coefficient: • Measures separation between 2 each time step when new data is received (1) Sliding window: model is
among points. Time Series: • Set of observations at every spatial cell avg. of time series (reducing trend & seasonality). (1) Differences of
trained on the most recent data (2) Forward chaining: model is trained
in ST grid can be considered a time series • Multiple dimensions in clusters • For individual point i: a = avg. distance of i to points in same consecutive observations. y’(t)=y(t)–y(t-1) (2) If non-stationary,
cluster; b = avg. distance of i to points in closest cluster • [-1, 1]. If 0: i on all available data. Time series classification: • Rather than
trajectory data correspond to the spatial identifiers (e.g., location differences of differences: y’’(t)=y’(t)–y’(t-1) (3) Compute seasonal
classifying parts of a time series (1) The current observation given the
coordinates) traversed by the moving objects. Trajectories: Multi- is very close to neighboring clusters. If 1: i is far away. If -1: i assigned differences: y’(t)=y(t)–y(t-m) where m is season period (month, year).
to wrong cluster (3) DB clustering previous ones (2) Classify entire time series. K-nearest neighbors
dimensional sequences that contain temporally ordered list of locations Autoregressive (AR) models: • Regression models: linear combin. of
validation. (KNN): • Each time series is 1 point • Assign each test point to its
visited by the moving object; with any other information recorded by predictors x(t) • AR: linear combin. of past values (lags). Moving avg.
closest training point: 1-NN • Small K values produce many small
the object. Spatial maps: • ST raster data can be viewed as a collection (MA) models: linear combination of past estimation errors • Unlike
L2 Fundamentals of Time Series Analysis regions of each class • Large K leads to fewer, larger regions. • How to
of spatial maps observed at every time stamp • Common approach for x(t) & y(t) errors ((t) aren’t observed • Follows unit normal distribution
Time series (temporal): • Current time: t • Observation at current time: find closest points? Euclidean distance: for 2 time series
extracting features among spatial maps is using image segmentation ((t) ~ N(0,1). MA ßà AR is possible.