Statistical Methods in Movement Ecology: From Animal Tracking to Biomedical Applications

Levi James Nov 26, 2025 44

This article provides a comprehensive overview of the statistical frameworks used to analyze animal movement data, tailored for researchers and drug development professionals.

Statistical Methods in Movement Ecology: From Animal Tracking to Biomedical Applications

Abstract

This article provides a comprehensive overview of the statistical frameworks used to analyze animal movement data, tailored for researchers and drug development professionals. It explores foundational concepts like hierarchical movement models and Statistical Movement Elements (StaMEs), details the application of methods including Resource Selection Functions (RSFs), Step-Selection Functions (SSFs), and Hidden Markov Models (HMMs), addresses common analytical challenges and data integration issues, and offers a comparative validation of different modeling approaches. By linking ecological insights with biomedical research, particularly in preclinical behavioral analysis, this guide serves as a critical resource for selecting and implementing the most appropriate statistical models for specific research questions.

Deconstructing Animal Movement: From Raw Tracks to Ecological Insight

The Movement Ecology Paradigm (MEP) was formally introduced to unify the study of organismal movement by proposing an integrative framework that links the internal state, motion capacity, and navigation capacity of an individual with the external environment [1] [2]. This paradigm posits that movement paths are the outcome of the interaction between these four core components, providing a mechanistic approach applicable to all movement types and organisms [3]. The MEF aims to develop a general theory for understanding the causes, mechanisms, patterns, and consequences of all movement phenomena, moving beyond taxon-specific and specialized approaches that had previously characterized movement research [1].

The field has experienced tremendous growth, fueled by technological advancements in tracking technologies and data analysis capabilities [2]. Modern movement ecology places itself at the interface of multiple research fields including physics, physiology, data science, and ecology, leveraging massive quantities of tracking data collected at ever-finer spatiotemporal resolutions [2].

Core Components of the Movement Ecology Paradigm

Conceptual Framework and Interrelationships

The MEP framework is built upon four fundamental components that interact to shape movement paths [1] [2]:

Internal State: The intrinsic motivation to move, encompassing all factors specific to the individual that affect its propensity to move
Motion Capacity: The suite of traits that enables the individual to move
Navigation Capacity: The suite of traits that enables the individual to orient its movement in space and/or time
External Factors: All social and environmental factors that affect the movement of the individual

The interaction between these components produces movement paths that can be classified according to their functionality during an individual's life, with the sum of movements constituting the "lifetime track" of an individual [3].

Quantitative Assessment of Research Trends

An analysis of movement ecology literature from 2009-2018 reveals distinct patterns in how these components have been studied. Research has predominantly focused on the effects of external factors on movement, with motion and navigation capacities receiving comparatively less attention [2].

Table 1: Research focus on MEP components (2009-2018)

MEP Component	Research Attention	Primary Methods	Knowledge Gaps
External Factors	High (dominant focus)	Remote sensing, environmental data layers, SSFs	Integration with other components
Internal State	Moderate	Accelerometers, physiological sensors, HMMs	Direct measurement of motivation
Motion Capacity	Low	Biomechanical modeling, movement metrics	Trade-offs with other traits
Navigation Capacity	Low	Experimental displacement, sensor data	Cognitive processes in wild populations

The technological landscape has also evolved significantly, with increased use of GPS devices and accelerometers, and a majority of studies now using the R software environment for statistical computing [2]. This period has been described as a "golden era of biologging" due to the widespread diffusion of animal-borne sensors [2].

Statistical Methods in Movement Ecology

Statistical models for analyzing movement data have become increasingly sophisticated, with three mainstream approaches commonly used to relate animal movement data to environmental covariates: Resource Selection Functions (RSF), Step Selection Functions (SSF), and Hidden Markov Models (HMMs) [4]. Each method answers different ecological questions and requires different data resolutions.

Table 2: Comparison of statistical methods in movement ecology

Method	Temporal Scale	Primary Application	Key Advantages	Limitations
Resource Selection Function (RSF)	Coarse-scale	Habitat selection at home range scale	Ease of implementation; broad-scale patterns	Does not account for movement autocorrelation
Step Selection Function (SSF)	Fine-scale	Movement and habitat selection	Accounts for movement constraints; high-resolution insights	Requires high-frequency data
Hidden Markov Model (HMM)	Fine-scale	Linking behavior to environmental covariates	Identifies behavioral states; handles unobserved states	Complex implementation; computational intensity

Resource Selection Functions (RSF)

A Resource Selection Function is a widely used function that relates habitat characteristics to the relative probability of use by an animal [4]. RSFs compare observed animal locations ("used" locations) to randomly selected locations within an animal's home range ("available" locations) [4]. The RSF, (w(\mathbf{x})), is typically defined in exponential form:

$$w\left( {\mathbf{x}} \right) = {\text{exp}}\left( { \beta{1} x{1} + \beta{2} x{2} + \cdot \cdot \cdot + \beta{k} x{k} } \right)$$

where (\mathbf{x}={{x}{1},\dots , {x}{k}}) denotes the values of k predictor habitat variables and ({\beta }{1}),â€¦, ({\beta }{k}) are the associated selection coefficients [4]. In practice, coefficients are estimated using logistic regression, modeling the probability that a resource unit is used given its environmental covariates.

Step Selection Functions (SSF)

Step Selection Functions extend RSFs by incorporating movement constraints and temporal autocorrelation [4]. SSFs are particularly valuable for inferring interactions between moving individuals while accounting for environmental factors [5]. These functions model the probability of selecting a movement step based on both environmental characteristics and movement constraints, providing a more mechanistic understanding of movement decisions.

Recent research has demonstrated that neglecting physical environmental features when analyzing interactions between moving animals leads to biased inference, where inter-individual interactions are spuriously inferred as affecting movement [5]. When landscape data is unavailable, applying 'Spatial+'â€”a method that reduces bias from unmeasured spatial factorsâ€”can improve inference of inter-individual interactions [5].

Hidden Markov Models (HMM)

Hidden Markov Models are particularly powerful for identifying discrete behavioral states from movement data and linking these states to environmental covariates [4]. HMMs assume that an animal switches between a finite number of behavioral states, each characterized by different movement patterns, with transitions between states following a Markov process.

The advantage of HMMs lies in their ability to reveal variable associations with environmental factors across different behaviors [4]. For example, a case study on ringed seals demonstrated that HMMs can identify positive relationships between prey diversity and specific behavioral states (e.g., slow-movement behavior) that might be missed by other methods [4].

Experimental Protocols and Application Notes

Integrated Workflow for Movement Analysis

A comprehensive approach to movement ecology research involves multiple stages from study design to statistical analysis and interpretation. The following workflow integrates technological, conceptual, and analytical components within the MEP framework.

Protocol 1: Habitat Selection Analysis using SSF

Application: Quantifying fine-scale habitat selection while accounting for movement constraints and environmental heterogeneity.

Materials and Reagents:

GPS tracking devices (minimum 5Hz sampling rate)
Environmental data layers (remotely sensed or field-collected)
R statistical environment with amt, sf, and terra packages
High-performance computing resources for large datasets

Procedure:

Data Preparation (Duration: 2-3 days)
- Import and clean GPS tracking data, accounting for fix rates and measurement error
- Extract environmental covariates at each observed location
- Generate available points by sampling from a movement kernel around each observed location

Step Selection Analysis (Duration: 1-2 days)
- Define movement steps and turning angles between consecutive locations
- Fit SSF using conditional logistic regression
- Include interaction terms between environmental variables and movement parameters
Model Validation (Duration: 1 day)
- Assess model fit using cross-validation
- Check for residual spatial autocorrelation
- Implement "Spatial+" approach if landscape data is incomplete [5]
Interpretation (Duration: 1 day)
- Calculate relative selection strengths for environmental covariates
- Map predicted probability of use across the study area
- Relate selection patterns to internal state hypotheses

Protocol 2: Behavioral State Estimation using HMM

Application: Identifying discrete behavioral states from movement data and linking them to environmental conditions.

Materials and Reagents:

Multi-sensor biologgers (GPS + accelerometer + magnetometer)
Environmental data synchronized in time and space
R statistical environment with momentuHMM package
High-resolution habitat maps

Procedure:

Data Integration (Duration: 2 days)
- Synchronize data streams from multiple sensors
- Calculate movement metrics (step length, turning angle, speed, acceleration)
- Extract environmental conditions along movement path

Model Specification (Duration: 1 day)
- Define number of behavioral states based on exploratory analysis
- Specify probability distributions for observation processes
- Design matrix for state transition probabilities
Model Fitting (Duration: 1-3 days)
- Initialize parameters using method of moments
- Fit HMM using maximum likelihood estimation
- Address local maxima by using multiple initial values
State Decoding and Interpretation (Duration: 1 day)
- Apply Viterbi algorithm for most likely state sequence
- Calculate state-dependent environmental preferences
- Relate behavioral states to internal state and external factors

The Scientist's Toolkit: Essential Research Solutions

Table 3: Key research reagents and solutions for movement ecology studies

Tool Category	Specific Tools	Function	Application Examples
Tracking Technologies	GPS loggers, Accelerometers, Radio-telemetry	Recording movement paths at various spatiotemporal scales	Quantifying habitat selection [4], identifying behavioral states [4]
Environmental Data	Remote sensing imagery, Habitat maps, Climate data	Characterizing external factors affecting movement	Resource selection functions [4], step selection analysis [5]
Statistical Software	R packages (`amt`, `momentuHMM`), Python libraries	Implementing statistical models for movement analysis	Fitting SSFs and HMMs [4], assessing interactions [5]
Field Equipment	Animal handling gear, Data download stations, Weatherproof housing	Deploying and maintaining tracking equipment	Long-term movement studies, sensor deployment and retrieval
Trigochinin C	Trigochinin C, MF:C38H42O11, MW:674.7 g/mol	Chemical Reagent	Bench Chemicals
9-Deacetyltaxinine E	9-Deacetyltaxinine E, MF:C35H44O9, MW:608.7 g/mol	Chemical Reagent	Bench Chemicals

Advanced Applications and Future Directions

The MEP provides a powerful foundation for addressing complex ecological questions, including species responses to environmental change, conservation planning, and understanding ecological processes across scales. Future research directions should focus on better integration of all MEP components, particularly the underexplored areas of motion capacity and navigation capacity [2].

Technological advancements continue to open new possibilities, with increasingly sophisticated sensors providing direct measurements of internal state (e.g., physiological sensors) and navigation capacity (e.g., magnetometers) [2]. The integration of movement ecology with other disciplines, including human mobility science, offers promising avenues for developing more general theories of movement [2].

Methodological challenges remain, particularly in accounting for landscape heterogeneity when inferring inter-individual interactions [5] and appropriately scaling from individual movement paths to population-level consequences [3]. The continued development of statistical methods such as SSFs and HMMs, coupled with the conceptual foundation of the MEP, provides a robust framework for addressing these challenges and advancing our understanding of organismal movement.

Movement ecology has increasingly focused on deconstructing the lifetime tracks of animals into hierarchically organized segments to understand the drivers and consequences of movement across spatiotemporal scales [6]. This hierarchical path-segmentation (HPS) framework is essential for elucidating how behavior, cognition, and physiology develop in relation to environmental changes [6]. The most robustly definable segments within an individual's trajectory are its diel activity routines (DARs), which represent repeated 24-hour movement path segments anchored by a fixed-duration biological clock [7] [6]. These DARs are themselves composed of smaller-scale behavioral units, including canonical activity modes (CAMs) and fundamental movement elements (FuMEs) [6] [8].

Analyzing movement through this hierarchical lens allows researchers to bridge the gap between fine-scale biomechanical processes and broader ecological patterns, facilitating predictions about how individuals may respond to environmental changes such as climate shifts and habitat modification [7] [8]. This paper presents application notes and protocols for implementing hierarchical movement analysis, with a specific focus on the transition from Fundamental Movement Elements to diel routines, framed within statistical methods for movement ecology research.

Core Concepts and Definitions

The hierarchical framework organizes movement into discrete but interconnected levels:

Fundamental Movement Elements (FuMEs): These represent elemental biomechanical movements that serve as the basic building blocks of all movement tracks, analogous to nucleic acids in DNA sequences. Examples include individual steps, wing flaps, or fin strokes [6] [8]. In practice, FuMEs are often difficult to extract from standard relocation data alone and may require accelerometer data or video analysis for precise identification [8].
Statistical Movement Elements (StaMEs): When actual FuMEs cannot be identified, StaMEs (previously called metaFuMEs) serve as statistical proxies. These are derived from the statistical properties (e.g., means, standard deviations, correlations) of short, fixed-length segments of relocation tracks, typically comprising 10-30 consecutive points [8].
Canonical Activity Modes (CAMs): These are short, fixed-length sequences of FuMEs or StaMEs that represent interpretable activities such as dithering, ambling, directed walking, or running [6] [8].
Behavioral Activity Modes (BAMs): Variable-length sequences of CAMs characterize behavioral states such as foraging, resting, or traveling. These represent characteristic mixtures of CAMs that serve specific behavioral functions [8].
Diel Activity Routines (DARs): These 24-hour movement segments represent the daily activity routines of individuals, composed of sequences of BAMs and CAMs. DARs provide a biological anchor for movement analysis due to their fixed duration [7] [6].
Lifetime Movement Phases (LiMPs): Supra-diel segments consisting of multiple DARs, such as seasonal ranges or migrations [6].
Lifetime Tracks (LiTs): The complete movement record of an individual from birth to death, comprising sequences of LiMPs [6].

Table 1: Hierarchical Levels in Movement Path Segmentation

Level	Definition	Duration	Composition
FuME	Fundamental Movement Element	Variable (sub-second to seconds)	Elemental biomechanical movements
StaME	Statistical Movement Element	Fixed (short segments)	Statistical properties of relocation sequences
CAM	Canonical Activity Mode	Fixed	Sequences of FuMEs/StaMEs
BAM	Behavioral Activity Mode	Variable	Characteristic mixtures of CAMs
DAR	Diel Activity Routine	24 hours	Sequences of BAMs/CAMs
LiMP	Lifetime Movement Phase	Variable (days to months)	Sequences of DARs
LiT	Lifetime Track	Lifetime	Sequences of LiMPs

Analytical Framework and Workflow

The analytical workflow for hierarchical movement analysis proceeds through several interconnected stages, from data collection to the classification of diel routines.

Diagram 1: Hierarchical Movement Analysis Workflow. The analytical process flows from data collection through successive stages of segmentation and classification, with hierarchical levels shown on the right.

Whole-Path Metrics for DAR Categorization

At the DAR level, geometric whole-path metrics that are relatively insensitive to data resolution are particularly useful for categorization [7]. These scalar metrics characterize the geometry of daily movement trajectories:

Net Displacement: Distance between start and end points of the diel path
Maximum Displacement: Maximum distance from the start point reached during the diel period
Maximum Diameter: Largest distance between any two points in the path
Maximum Width: Breadth of the path perpendicular to its main axis [7]

These metrics can be used in multivariate analyses such as Principal Component Analysis (PCA) to reduce dimensionality. In barn owl research, PC1 accounted for 86.5% of variation and represented a DAR scale factor, while PC2 accounted for 8.4% of variation and captured the "openness" of the DAR (whether animals returned to their start point) [7].

Table 2: Whole-Path Metrics for DAR Geometric Categorization

Metric	Definition	Interpretation	Calculation
Net Displacement	Distance between start and end points	Measures "openness" of the path; indicates whether animal returns to origin	Straight-line distance between first and last fix
Maximum Displacement	Maximum distance from start point	Indicates maximum range of movement from starting location	Maximum of distances between each fix and start point
Maximum Diameter	Largest distance between any two points	Represents the overall spatial extent of the daily movement	Maximum of pairwise distances between all fixes
Maximum Width	Breadth perpendicular to main axis	Captures the perpendicular spread of movement relative to primary direction	Computed using convex hull or similar methods

Experimental Protocols and Methodologies

Data Collection Requirements

High-frequency movement data is essential for robust hierarchical analysis. Studies cited in these search results collected data at frequencies ranging from sub-seconds to minutes [7] [9]. For DAR-level analysis, a minimum of 2-20 relocations per hour is recommended, though higher frequencies enable more detailed FuME/StaME analysis [7].

The appropriate start and end times for segmenting 24-hour DARs should be determined based on species-specific behavioral rhythms. For example, in black rhinos, 6:00 AM was identified as a better start/finish point than noon, 6:00 PM, or midnight due to reduced variation in spatial displacements [6].

DAR Categorization Protocol

Objective: To categorize diel movement paths into distinct geometric types based on whole-path metrics.

Materials:

High-frequency movement data (minimum 2-20 relocations per hour)
Statistical software with clustering capabilities (R recommended)
Computing resources adequate for multivariate analysis

Procedure:

Segment movement tracks into 24-hour periods using biologically relevant start times
Calculate the four whole-path metrics (net displacement, maximum displacement, maximum diameter, maximum width) for each DAR
Perform data normalization (z-scoring) of metrics to ensure equal weighting
Conduct Principal Component Analysis (PCA) on the normalized metrics
Apply Ward clustering algorithm to the principal components
Determine optimal cluster number using:
- Variance explanation thresholds
- Ecological interpretability
- Sufficient sample sizes per category
Assign descriptive labels to DAR categories based on geometric properties

Application Note: In the barn owl case study, this protocol categorized 6,230 DARs into 7 distinct types: 5 closed (returning to same roost), 1 partially open (returning to nearby roost), and 1 fully open (leaving for another region) [7].

Statistical Movement Elements (StaMEs) Extraction Protocol

Objective: To identify statistical building blocks of movement when FuMEs cannot be directly observed.

Materials:

High-resolution relocation data (â‰¥5 relocations per minute ideal)
Computational resources for time-series analysis

Procedure:

Extract step-length (SL) and turning-angle (TA) time series from relocation data
Divide the track into fixed-length segments (typically 10-30 consecutive points)
For each segment, calculate statistical properties:
- Means and standard deviations of SL and TA
- Autocorrelations at various lags
- Derived quantities (radial and tangential velocities)
Apply clustering algorithms to the statistical vectors
Identify cluster centroids representing distinct StaME types
Validate StaME categories through behavioral observation where possible

Application Note: StaMEs serve as substitutes for FuMEs in hierarchical construction of movement tracks and can be classified into categories such as "directed fast movement" versus "random slow movement" elements [8].

Research Toolkit

Table 3: Essential Analytical Tools for Hierarchical Movement Analysis

Tool Category	Specific Methods/Software	Application in Hierarchy	Key Functions
Data Collection	GPS loggers, ATLAS reverse-GPS, accelerometers, camera traps	All levels	High-frequency relocation data collection, behavioral validation
Path Segmentation	Behavioral Change Point Analysis (BCPA), Hidden Markov Models (HMMs)	CAM/BAM identification	Identifying transitions between behavioral states
Cluster Analysis	Ward algorithm, k-means, model-based clustering	StaME, CAM, DAR classification	Categorizing movement elements and routines
Multivariate Statistics	Principal Component Analysis (PCA), Factor Analysis	Dimensionality reduction for DAR metrics	Identifying major axes of variation in path geometry
Movement Metrics	Step-length/turning-angle distributions, net displacement, maximum diameter	FuME/StaME and DAR characterization	Quantifying geometric properties of movement
Statistical Modeling	Generalized Linear Mixed Models (GLMMs), Resource Selection Functions (RSFs)	Testing effects of covariates on movement	Assessing influence of age, sex, environment on DARs
Specialized Software	R packages (amt, momentuHMM), Numerus ANIMOVER simulator	All analytical stages	Implementing specialized movement analyses and simulations
Dihydrotamarixetin	Dihydrotamarixetin, MF:C16H14O7, MW:318.28 g/mol	Chemical Reagent	Bench Chemicals
Paeonicluside	Paeonicluside, CAS:448231-30-9, MF:C18H24O11, MW:416.4 g/mol	Chemical Reagent	Bench Chemicals

Case Studies and Applications

Barn Owl DAR Categorization

A study of 44 barn owls (Tyto alba) in northeastern Israel demonstrated the application of hierarchical movement analysis, specifically at the DAR level [7]. Researchers employed ATLAS reverse-GPS technology to collect high-frequency movement data, then applied the DAR categorization protocol outlined in Section 4.2.

The analysis revealed that DARs were significantly larger in young owls than adults and in males compared to females, demonstrating how this approach can detect biologically meaningful patterns [7]. The study also constructed spatio-temporal distributions of DAR types for individuals and groups aggregated by age, sex, and seasonal quadrimester, identifying idiosyncratic behaviors within family groups in relation to location [7].

Elephant Diel Movement Analysis

Research on African savannah elephants (Loxodonta africana) illustrated the value of analyzing diel movement patterns in relation to environmental and social factors [9]. Using multi-year, high-resolution (hourly) GPS tracking data, researchers examined two key movement descriptors:

Diel Displacement (DD): Daily sum of net displacements, serving as a proxy for energy expenditure
Movement Predictability (MP): Degree of autocorrelated movement activity at diel timescales

The study found that both DD and MP increased with forage availability, but with significant interactions between forage availability and social rank, highlighting how social status influences movement strategies [9].

Baboon Diel Activity Patterns

A broad-scale camera trap study of chacma baboons (Papio ursinus) across 29 sites in Southern Africa demonstrated how diel activity patterns shift in response to environmental gradients [10]. Researchers analyzed over a million camera-trap detections to test hypotheses about thermoregulation, foraging optimization, and predation risk avoidance.

The findings revealed that baboons adjusted their diel activity patterns by avoiding midday heat but increasing dawn and night activity under predator pressure, demonstrating temporal flexibility as an adaptive strategy [10].

Integration with Statistical Habitat Models

Hierarchical movement analysis can be integrated with statistical models of species-habitat association to enhance ecological inference. Three prominent approaches include:

Resource Selection Functions (RSFs): Model the relative probability of use based on habitat characteristics, typically comparing "used" versus "available" locations [4]
Step Selection Functions (SSFs): Extend RSFs by incorporating movement constraints, analyzing habitat selection conditional on the animal's previous location [4]
Hidden Markov Models (HMMs): Relate discrete behavioral states to environmental covariates, allowing investigation of how habitat associations vary with behavioral modes [4]

Each method offers distinct advantages for different levels of the movement hierarchy, with HMMs being particularly well-suited for analyzing CAMs and BAMs in relation to environmental factors.

Hierarchical movement analysis from FuMEs to diel routines provides a powerful framework for understanding animal movement across spatiotemporal scales. The protocols and applications outlined here offer researchers a structured approach for implementing this framework in ecological research. By decomposing movement into hierarchically organized elements, researchers can bridge fine-scale biomechanical processes with broader ecological patterns, ultimately enhancing predictions of how animals respond to environmental change.

Introducing Statistical Movement Elements (StaMEs) as Building Blocks for Path Segmentation

Core Concept and Hierarchical Framework

Statistical Movement Elements (StaMEs) are a novel analytical construct that serves as the smallest achievable statistical building blocks for the hierarchical decomposition and synthesis of animal movement paths. In reality, animal movement paths are a concatenation of fundamental movement elements (FuMEs), such as a single step or wing flap. However, these are generally not extractable from standard relocation time-series data (e.g., GPS fixes). StaMEs are proposed as a practical substitute, derived from the statistical properties of short, fixed-length track segments [8] [11].

The following diagram illustrates the hierarchical framework for path segmentation, from raw data to complex behavioral routines.

This framework allows researchers to dissect real movement tracks and generate realistic synthetic ones, providing a general tool for testing hypotheses in movement ecology, such as evaluating an individual's response to landscape changes or identifying unusual stress [8].

Quantitative Framework and Data Presentation

StaMEs are generated by computing statistics for short, fixed-length segments of a movement track from which step-length (SL) and turning-angle (TA) time series have been extracted. The statistics of these variables for each segment form a vector that can be clustered into different StaME types [8] [11].

Table 1: Key Statistical Measures for Characterizing StaMEs

Measure Category	Specific Metrics	Computational Description
Central Tendency	Mean SL, Mean TA	Average values of step-lengths and turning angles within the fixed-length segment.
Dispersion	Standard Deviation of SL, Standard Deviation of TA	Variance in movement pace and directionality within the segment.
Temporal Correlation	SL Autocorrelation, TA Autocorrelation	Measures of serial dependence, indicating persistence in speed or direction.
Derived Quantities	Radial Velocity, Tangential Velocity	Kinematic measures computed at each relocation point [8].

These segment-specific vectors are clustered, and the centroids of these clusters define a set of distinct StaME categories (e.g., "directed fast movement" versus "random slow movement") [8]. The parameters for this clustering process are crucial for the method's resolution.

Table 2: Clustering Parameters for Hierarchical Segmentation

Parameter	Typical Value/Range	Description and Impact
Segment Length (Î¼)	10 - 30 relocation points [8]	Duration of the ultra-fine "base segment". Influences the granularity of StaMEs.
Word Length (m)	Number of base segments [11]	Number of StaMEs combined to form a "word" for CAM classification.
StaME Categories (n)	Determined by clustering (e.g., k-means)	Number of unique statistical movement elements identified.
CAM Categories (k)	Determined by clustering [11]	Number of canonical activity modes identified from "words".

Experimental Protocol: Implementing StaME-Based Path Segmentation

Protocol 1: From Raw Tracking Data to StaME Classification

Objective: To process raw animal relocation data into classified Statistical Movement Elements (StaMEs).

Reagents & Materials:

Input Data: A time series of animal relocations (e.g., GPS coordinates) with a consistent sampling frequency [8].
Software: Computational environment suitable for movement data analysis (e.g., R or Python), with packages for clustering (e.g., stats, cluster in R) [11].

Workflow:

Data Preprocessing: Import relocation data. Clean the data by removing or fixing any erroneous fixes.
Time-Series Extraction: From the cleaned relocation data, calculate the time series of:
- Step-length (SL): The distance between consecutive relocations.
- Turning-angle (TA): The change in direction between consecutive steps [8].
Segment Generation: Divide the SL and TA time series into short, consecutive, fixed-length segments. A segment length of 10-30 points is a common starting point [8].
Statistical Summarization: For each segment, compute a vector of summary statistics. The core set should include, at a minimum:
- Mean(SL), Standard Deviation(SL)
- Mean(TA), Standard Deviation(TA)
- Autocorrelation(SL, lag=1)
Clustering: Apply a clustering algorithm (e.g., k-means, hierarchical clustering) to the matrix where rows are segments and columns are the summarized statistics. This will group segments with similar statistical properties.
StaME Classification: Assign each segment a StaME type label based on its cluster assignment. The cluster centroids represent the prototypical StaMEs [8] [11].

Protocol 2: Hierarchical Segmentation into CAMs and BAMs

Objective: To aggregate StaMEs into Canonical Activity Modes (CAMs) and Behavioral Activity Modes (BAMs).

Workflow:

CAM Formation ("Words"):
- Concatenate sequences of m consecutive StaMEs to form "words" [11].
- Cluster these words into k categories. These are the "raw" Canonical Activity Modes.
- Optional Rectification: Implement a process where all word segments coded by the same sequence of m StaMEs are identified with the same "rectified" CAM type. This enhances consistency [11].
- Behavioral Interpretation: Assign an ecological interpretation to each CAM type based on its constituent StaMEs (e.g., "fast, directed movement," "slow, random search") [8].
BAM Identification: Identify longer, variable-length segments of the track that are characterized by a specific mixture or sequence of CAMs. These segments represent coherent Behavioral Activity Modes.
- Example: A "resource gathering" BAM might be composed of a characteristic sequence of "slow random movement" (searching) and "stationary" CAMs [8] [11].
Validation: Use the percentage of reassignment errors and information theory measures (e.g., Jensen-Shannon divergence) to compare the coding efficiency of different parameter sets (Î¼, m, n, k) [11].

The Scientist's Toolkit: Essential Research Reagents

Table 3: Key Analytical Tools and Resources for StaME Analysis

Tool / Resource	Type	Function in Analysis
Relocation Data (GPS/ATLAS)	Primary Data	The foundational time-series of animal positions used to derive step-lengths and turning angles [8] [4].
Clustering Algorithm (e.g., k-means)	Computational Method	Groups track segments based on their statistical properties to define StaME and CAM categories [11].
Information Theory Measures	Analytical Metric	Quantifies the efficiency and performance of the hierarchical segmentation, aiding in parameter selection [11].
R Packages (e.g., `amt`)	Software Tool	Provides functions for handling movement data, calculating derived quantities, and implementing related models like SSFs [4].
Numerus Studio Platform	Simulation Environment	A user-friendly platform to run and explore multi-modal movement simulators like ANIMOVER_1, which can be used to test the StaME framework [8].
Jangomolide	Jangomolide, MF:C26H28O8, MW:468.5 g/mol	Chemical Reagent
Maoecrystal B	Maoecrystal B, MF:C22H28O6, MW:388.5 g/mol	Chemical Reagent

Integration with Established Movement Ecology Methods

The StaME framework is designed to complement, not replace, existing segmentation methods like Behavioral Change Point Analysis (BCPA) and Hidden Markov Models (HMMs) [8] [11]. It acts as a "magnifying lens" on the segments identified by these top-down methods, revealing how broader behavioral states (BAMs) are themselves composed of finer-scale canonical activities (CAMs) built from fundamental StaMEs [11]. This multi-scale, bottom-up approach provides a refined coding scheme for understanding the complex hierarchical structure of animal movement.

Linking Movement Patterns to Ecological Processes and Fitness Outcomes

Understanding the relationship between animal movement and ecology is fundamental for conservation and understanding biological processes. Statistical models transform raw movement data into insights about habitat selection, behavioral states, and ultimately, fitness outcomes. The choice of model is critical, as different methods are designed to answer specific ecological questions and operate at different spatial and temporal scales [4].

This document provides application notes and protocols for three primary statistical methods used to link movement patterns to ecology: Resource Selection Functions (RSF), Step Selection Functions (SSF), and Hidden Markov Models (HMM). We detail their implementation, required data, and interpretation, providing a toolkit for researchers to connect movement paths to underlying ecological processes.

Model Descriptions and Mathematical Foundations

Resource Selection Functions (RSF)

Description: RSFs are a widely used method to quantify habitat selection by comparing environmental conditions at locations used by an animal to those available within its home range or study area. They estimate the relative probability of use of a resource unit as a function of environmental covariates [4].

Mathematical Foundation: The RSF, (w(\mathbf{x})), is typically defined in an exponential form [4]: [ w(\mathbf{x}) = \exp( \beta{1} x{1} + \beta{2} x{2} + \cdot \cdot \cdot + \beta{k} x{k} ) ] where (\mathbf{x}={{x}{1},\dots , {x}{k}}) are the values of k environmental predictor variables and ({\beta }{1}),â€¦, ({\beta }{k}) are the selection coefficients to be estimated. In practice, these coefficients are often estimated using logistic regression. The probability that a location i is used, given its covariates ({\mathbf{x}}{i}), is modeled as [4]: [ Pr(y{i} = 1|{\mathbf{x}}{i} ) = \frac{{{\text{exp}}\left( {\beta{1} x{1,i} + \beta{2} x{2,i} + \cdot \cdot \cdot + \beta{k} x{k,i} } \right)}}{{1 + {\text{exp}}\left( {\beta{1} x{1,i} + \beta{2} x{2,i} + \cdot \cdot \cdot + \beta{k} x_{k,i} } \right)}} ]

Step Selection Functions (SSF)

Description: SSFs extend RSFs by explicitly incorporating movement dynamics into the analysis of habitat selection. They compare observed movement steps (the straight-line path between two consecutive locations) and their associated environmental covariates to a set of available, but not chosen, random steps originating from the same starting point [4]. This method integrates movement with habitat selection, addressing autocorrelation in the data.

Mathematical Foundation: The SSF shares a similar mathematical form with the RSF but is applied to a different conceptual framework. The likelihood of selecting a step to location i is proportional to [4]: [ w(\mathbf{x}, \mathbf{z}) = \exp( \beta{1} x{1} + \cdots + \beta{k} x{k} + \gamma{1} z{1} + \cdots + \gamma{m} z{m} ) ] Here, (\mathbf{x}) represents habitat covariates at the endpoint of the step, while (\mathbf{z}) can represent movement-related characteristics such as step length or turning angle, linking the habitat selection directly to the movement process.

Hidden Markov Models (HMMs)

Description: HMMs are a powerful tool for identifying latent (unobserved) behavioral states from sequential movement data. The model assumes that an animal's movement path is generated by a finite number of behavioral states (e.g., "foraging," "exploring," "resting"). Each state is characterized by a distinct probability distribution for movement metrics (e.g., step length, turning angle). The animal transitions between these states according to a probability matrix [12].

Mathematical Foundation: A basic HMM consists of [12]:

State Process: An unobserved (hidden) Markov chain ({St}), where (St) denotes the behavioral state at time t. The state transition probabilities are defined by (\Gamma = (\gamma{ij})), where (\gamma{ij} = Pr(St = j | S{t-1} = i)).
Observation Process: The observed data ({Xt}), which are the movement metrics (e.g., step lengths, turning angles). The state-dependent probability distributions are defined by (fi(x) = p(Xt = x | St = i)).

The model is fitted by maximizing the likelihood of the observations, marginalizing over all possible state sequences.

Experimental Protocols and Workflows

Protocol 1: Fitting a Resource Selection Function (RSF)

Aim: To quantify second-order habitat selection (selection of a home range within the population's range) or third-order selection (selection of habitat within the home range) [4].

Workflow:

Detailed Methodology:

Data Preparation: Start with cleaned animal relocation data ("used" locations).
Define Availability: Generate a spatial polygon representing the area available to the animal. For third-order selection, this is typically the individual's home range, estimated using methods like Minimum Convex Polygon (MCP) or Kernel Density Estimation (KDE) [4].
Generate Available Points: Randomly sample points within the availability polygon. The number of available points per used point can vary (e.g., 10:1 is common).
Extract Covariates: For each used and available location, extract values for all relevant environmental covariates (e.g., elevation, vegetation type, distance to water).
Model Fitting: Fit a logistic regression model where the response variable is 1 for used points and 0 for available points. The predictor variables are the extracted environmental covariates. The estimated coefficients ((\beta)) represent the strength and direction of selection for each covariate.
Validation: Use cross-validation techniques (e.g., k-fold) to assess the model's predictive performance. This involves repeatedly fitting the model on a subset of data and testing its prediction on the remaining data.

Protocol 2: Fitting a Step Selection Function (SSF)

Aim: To integrate movement constraints with habitat selection, providing a more mechanistic understanding of animal movement at a fine spatiotemporal scale [4].

Workflow:

Detailed Methodology:

Data Preparation: Use regularly spaced telemetry data. Calculate step lengths (distances between consecutive points) and turning angles (changes in direction).
Generate Control Steps: For each observed step, generate a set of random "control" steps (e.g., 10) that start from the same origin. The step lengths and turning angles for these control steps are typically drawn from distributions estimated from the observed data, representing movement options available to the animal.
Extract Covariates: At the endpoint of every observed and control step, extract the values of the environmental covariates of interest.
Model Fitting: Fit a conditional logistic regression model (Stratified Cox Proportional Hazards model). In this model, each stratum is a matched set consisting of one observed step and its associated control steps. This conditions the analysis on the starting point and available movement options, isolating the effect of habitat on destination choice.
Interpretation: The resulting coefficients represent habitat selection while accounting for the underlying movement process. The model can be used to simulate movement paths in environmental space.

Protocol 3: Fitting a Hidden Markov Model (HMM)

Aim: To identify latent behavioral states from movement data and link these states to environmental conditions [12].

Workflow:

Detailed Methodology:

Data Preparation: Start with regular time-step data. Calculate step lengths and turning angles. Step lengths are often transformed (e.g., log) to improve model fitting.
Model Specification: Decide on the number N of behavioral states to include in the model. This can be informed by the biology of the study species and model selection criteria (e.g., AIC, BIC).
Initialization: Provide initial values for the parameters of the state-dependent distributions (e.g., gamma for step length, von Mises for turning angle) and the state transition probability matrix. This is often a critical step, as the likelihood surface can have local maxima.
Model Fitting: Use a numerical optimization technique (commonly the Expectation-Maximization algorithm) to find the parameter values that maximize the likelihood of the observed data. The forward-backward algorithm is used to efficiently compute probabilities during this process.
State Decoding: Use the Viterbi algorithm to determine the most probable sequence of behavioral states that generated the observed movement path.
Post-hoc Analysis: Relate the decoded behavioral states to environmental covariates, for example, by using the states as a predictor in a subsequent statistical model to understand which habitats are associated with specific behaviors.

Comparative Analysis and Data Presentation

The choice between RSF, SSF, and HMM depends heavily on the research question, the scale of inference, and the nature of the available data. The table below provides a direct comparison to guide method selection.

Table 1: Comparison of Key Statistical Models in Movement Ecology

Feature	Resource Selection Function (RSF)	Step Selection Function (SSF)	Hidden Markov Model (HMM)
Primary Ecological Question	Where does an animal use space relative to what is available?	How does an animal select habitat while moving?	What behavioral states is an animal in, and how do they change?
Scale of Inference	Home range (2nd/3rd order selection)	Within-home range, fine-scale (3rd order)	Behavioral process scale
Handles Autocorrelation	Poorly; requires careful sampling of available points	Explicitly accounts for it via conditional likelihood	Explicitly models it as a state process
Key Input Data	Used locations, availability polygon, environmental layers	Used steps, distributions for step length & turning angle, environmental layers	Time-series of step lengths & turning angles
Typical Output	A map of relative probability of use	A model integrating movement and habitat selection	A sequence of predicted behavioral states
Key Advantage	Conceptual and implementation simplicity; broad-scale insight	Mechanistic link between movement and habitat selection	Direct inference of unobserved behaviors

Table 2: Quantitative Data Requirements and Outputs

Model	Minimum Required Data Points (per individual)	Typical Temporal Resolution	Key Analytical Outputs
RSF	30-50+ locations to define use	Low to moderate (hours-days)	Selection coefficients ((\beta)), p-values, RSF map
SSF	100+ locations for step distributions	High (minutes-hours)	Selection coefficients ((\beta)), movement parameters, integrated step selection map
HMM	100+ locations for time-series analysis	High (minutes-hours)	State transition matrix, state-dependent distribution parameters, decoded state sequence

The Scientist's Toolkit: Research Reagent Solutions

Table 3: Essential Computational Tools and Packages for Movement Ecology

Tool / Software Package	Primary Function	Key Features / Notes
R Statistical Environment	Platform for all statistical analysis and modeling.	The primary environment for ecological statistics; all below are typically R packages.
`amt` R Package	Manages tracking data and fits RSFs & SSFs [4].	Provides a coherent framework for data management, analysis, and visualization for steps and tracks.
`momentuHMM` R Package	Fits complex HMMs to animal movement data [4].	Extends `moveHMM`, allows for multiple data streams and hierarchical structures [12].
`moveHMM` R Package	Fits basic Hidden Markov Models to movement data [12].	A user-friendly introduction to HMMs for step length and turning angle analysis.
GPS Biologging Devices	Collects high-resolution location data from free-ranging animals [4].	The primary source of the movement data used in these analyses.
Geographic Information System (GIS)	Manages, analyzes, and visualizes spatial data (e.g., environmental covariates).	Used to process spatial layers and extract covariate values at animal locations (e.g., using QGIS or ArcGIS).
Ilicol	Ilicol, MF:C15H26O2, MW:238.37 g/mol	Chemical Reagent
Ananonin A	Ananonin A, MF:C30H32O9, MW:536.6 g/mol	Chemical Reagent

From Patterns to Processes: Linking Movement to Fitness

The ultimate goal in movement ecology is often to understand how movement decisions translate into survival and reproductive success (fitness). The statistical models described are a critical intermediate step.

HMMs and Energetics: Decoded behavioral states from HMMs (e.g., "foraging," "transit") can be linked to energy expenditure models. Time spent in high-cost versus high-reward states can be a proxy for fitness.
SSFs and Resource Acquisition: SSFs can identify habitats selected during foraging, allowing researchers to model the potential caloric or nutrient intake along a movement path.
RSFs and Population Dynamics: RSF-derived maps of high-quality habitat can be overlaid with mortality data (e.g., from predator activity or human infrastructure) to create risk landscapes. The overlap between selection and risk can directly inform survival probabilities.

By combining these movement models with demographic and environmental data, researchers can build integrated models that move beyond correlation toward a mechanistic understanding of how movement patterns in a heterogeneous environment ultimately drive fitness outcomes.

The Critical Role of GPS, Biologging, and Sensor Technologies in Data Collection

Application Notes: Current Capabilities and Research Outputs

The integration of advanced biologging technologies has fundamentally transformed movement ecology, enabling unprecedented data collection on animal behavior, physiology, and environmental interactions. The table below summarizes the quantitative capabilities and primary research outputs of modern biologging platforms.

Table 1: Quantitative Data and Research Applications of Biologging Technologies

Technology Type	Measured Parameters	Research Applications	Example Scale/Resolution
GPS & Satellite Loggers	Horizontal position (latitude/longitude), altitude, speed [13]	Migration routes, habitat selection, distribution mapping, space use [14] [4]	Global coverage; 7.5 billion location points in Movebank (2025) [13]
Multi-sensor Biologgers	Depth, acceleration, angular velocity, body temperature, water salinity, atmospheric pressure [13] [14]	Diving/flight behavior, energy expenditure, physiology, identification of mortality events [14]	Data on 1478 taxa; can record for >1 year [13] [14]
Animal-Borne Ocean Sensors	Water temperature, salinity [13]	Physical oceanography, climate change monitoring, complementing Argo float data [13]	Data volume from seals comparable to Argo floats in polar regions [13]
Vertical-Looking Radars (VLRs)	Flight altitude, wing movement, timing, track, size/shape of flying animals [15]	Migration ecology, stopover behavior, movement phenology [15]	Detection up to ~2 km above ground [15]

The data collected by these technologies serve dual purposes. First, they provide direct insight into the lives of individual animals, revealing fine-scale behaviors and their drivers [16]. Second, they contribute to large-scale environmental monitoring, turning animals into mobile sensors of the world's oceans and atmospheres [13]. Platforms like the Biologging intelligent Platform (BiP) have been developed to standardize, store, and share these complex datasets, facilitating collaborative research across disciplines such as ecology, oceanography, and meteorology [13]. A key feature of BiP is its Online Analytical Processing (OLAP) tools, which can calculate environmental parameters like surface currents and ocean winds from the data collected by animals [13].

Experimental Protocols

This section outlines detailed methodologies for employing biologging technologies within a movement ecology research framework, from study design to data interpretation.

Protocol 1: Investigating Species-Habitat Associations

Application: Identifying critical habitat and understanding the environmental drivers of animal space use [4].

Materials:

Animal capture and handling equipment (e.g., traps, nets, tranquilizer darts).
GPS biologgers with appropriate attachment method (e.g., collar, harness, glue).
Access to environmental covariate datasets (e.g., vegetation, prey diversity, terrain).

Procedure:

Device Deployment: Deploy GPS loggers on study animals. Record detailed metadata, including individual animal traits (species, sex, body size), device specifications, and deployment information (location, date, method) [13].
Data Collection: Collect animal location data at a temporal resolution appropriate for the research question. For fine-scale movement, high-frequency data (e.g., every few minutes) is required [4].
Data Standardization: Upload sensor data and metadata to a standardized platform like BiP to ensure interoperability and facilitate future reuse [13].
Define "Used" and "Available" Locations: For each observed ("used") animal location, generate a set of "available" locations within the animal's potential movement range (e.g., its home range) at that time [4].
Model Fitting: Relate the used and available locations to environmental covariates using an appropriate statistical model.
- Resource Selection Function (RSF): Uses logistic regression to model the relative probability of use as a function of habitat covariates [4]. The model is: Pr(use) = exp(Î²â‚xâ‚ + Î²â‚‚xâ‚‚ + ... + Î²â‚–xâ‚–) / (1 + exp(Î²â‚xâ‚ + Î²â‚‚xâ‚‚ + ... + Î²â‚–xâ‚–)) where x are covariates and Î² are selection coefficients.
- Step-Selection Function (SSF): Accounts for movement constraints by defining availability based on the animal's starting point and movement capabilities [4].
Interpretation: Positive selection coefficients (Î²) indicate a preference for a habitat feature, while negative coefficients indicate avoidance.

Protocol 2: Inferring Behavior and Fitness from Multi-sensor Data

Application: Linking movement and behavior to individual fitness outcomes like survival and reproduction [14].

Materials:

Multi-sensor biologgers (e.g., combining GPS, accelerometer, and temperature sensors).
Computational resources for analyzing high-resolution data.

Procedure:

Device Deployment: Deploy multi-sensor loggers on a cohort of individuals from a study population.
Data Collection: Collect synchronized data streams (e.g., GPS for location, accelerometry for behavior classification, temperature for physiology) over an extended period (e.g., a full annual cycle or lifetime) [14].
Behavioral State Inference: Use statistical models like Hidden Markov Models (HMMs) to classify raw sensor data into discrete behavioral states (e.g., resting, foraging, migrating). HMMs assume an animal's movement metrics (e.g., step length, turning angle) are dependent on its underlying, unobserved behavioral state [4].
Fitness Metric Extraction:
- Reproduction: Identify recursive movements to a central place (e.g., a nest) via GPS data. Validate with accelerometer data showing characteristic nesting behaviors [14].
- Survival/Mortality: Identify mortality events from long-term stationary sensor readings, sentinel alerts from the device, or a combination of sensor data (e.g., lack of movement and a drop in body temperature) [14].
- Energetics: Derive proxies for energy expenditure (e.g., Overall Dynamic Body Acceleration - VeDBA) from accelerometer data [14].
Spatio-Temporal Mapping: Map the inferred fitness metrics (energy expenditure, reproduction sites, mortality locations) onto environmental layers to create a "fitness landscape" and understand the environmental conditions associated with success or failure [14].

Protocol 3: Wildlife Disease Surveillance through Movement Data

Application: Early detection and management of zoonotic disease outbreaks [17].

Materials:

GPS transmitters that allow for near-real-time data transmission.
Access to disease outbreak databases (e.g., EMPRES-i).
Computational tools for analyzing movement anomalies.

Procedure:

Sentinel Species Selection: Identify and tag potential wildlife sentinel species known to be hosts or bridge hosts for pathogens of concern (e.g., waterfowl for avian influenza) [17].
Baseline Behavior Establishment: Collect individual movement data to establish baseline behavior and social interaction patterns.
Anomaly Detection: Monitor near-real-time data streams for behavioral anomalies predictive of disease, such as:
- Reduced movement capacity or local movements [17].
- Changes in social behavior (e.g., reduced contact rates).
- Unusual mortality events detected via sensor alerts.
Data Integration and Alerting: Integrate movement anomaly data with other surveillance data (e.g., dead bird reports) in a unified platform. Trigger alerts for targeted sampling when anomalies are detected [17].
Management and Modeling: Use real-time movement data to model potential disease spread and inform management actions, such as establishing movement barriers or issuing public health warnings [17].

Workflow Visualizations

Biologging Research Data Pipeline

From Sensor Data to Behavioral States

The Scientist's Toolkit: Essential Research Reagents and Materials

Table 2: Key Tools and Platforms for Biologging Research

Tool/Platform Name	Type	Primary Function	Relevance to Movement Ecology
Biologging intelligent Platform (BiP)	Data Repository & Analysis Platform	Standardized storage, sharing, and analysis of biologging data with metadata [13]	Facilitates interdisciplinary research; includes OLAP tools for estimating environmental parameters [13]
Movebank	Data Repository	Global database for animal tracking data [13]	Largest repository; contains 7.5 billion location points across 1478 taxa (2025) [13]
AniBOS	Observation Network	Global ocean observation system using animal-borne sensors [13]	Gathers physical environmental data worldwide to complement other observation systems [13]
`ctmm` R package	Statistical Software	Continuous-time movement modeling for animal tracking data [18]	Addresses autocorrelation and location error in tracking data; used for home-range analysis and habitat suitability [18]
`amt` R package	Statistical Software	Analysis of animal movement data [4]	Used for fitting Resource Selection Functions (RSFs) and Step-Selection Functions (SSFs) [4]
`momentuHMM` R package	Statistical Software	Analysis of animal movement data using Hidden Markov Models (HMMs) [4]	Infers latent behavioral states from movement data [4]
Vertical-Looking Radar (VLR)	Field Sensor	Detects and characterizes individual flying animals [15]	Studies migration ecology and flight behavior without requiring animal capture [15]
Daphnilongeranin C	Daphnilongeranin C, MF:C22H29NO3, MW:355.5 g/mol	Chemical Reagent	Bench Chemicals
Tenuifoliose I	Tenuifoliose I, MF:C59H72O33, MW:1309.2 g/mol	Chemical Reagent	Bench Chemicals

Core Statistical Models: RSFs, SSFs, HMMs, and Their Practical Implementation

Resource Selection Functions (RSFs) are statistical models used to estimate the relative probability of an animal selecting a resource unit based on environmental covariates, providing crucial insights into species-habitat relationships [19]. As a foundational tool in movement ecology, RSFs compare environmental attributes at locations used by animals against those available within their domain of use, enabling researchers to quantify habitat selection patterns across landscapes [20] [21]. This use-availability framework distinguishes RSFs from use-unused approaches and allows researchers to model habitat preference across multiple spatial and temporal scalesâ€”from second-order selection (home range placement in the landscape) to third-order selection (resource use within a home range) [20] [19].

The theoretical foundation of RSFs rests on the principle that animals selectively use landscape features disproportionate to their availability, indicating preference or avoidance [19]. By relating animal occurrence data to environmental predictors, RSFs facilitate understanding of critical habitat requirements, movement corridors, and species distributionsâ€”information essential for effective conservation planning and wildlife management [19]. These models have become indispensable in ecological research, particularly with the increasing availability of high-resolution tracking data from GPS and other biologging technologies [20].

Table 1: Key Definitions in Resource Selection Analysis

Term	Definition
Habitat	The set of environmental covariates that characterize the space an animal inhabits [19]
Habitat Selection	The process whereby individuals preferentially use or occupy habitats [19]
Habitat Availability	The accessibility, prevalence, and procurability of habitat components by animals [19]
Use-Availability Design	Sampling design comparing environmental conditions at used locations versus available locations [21]

Theoretical Foundation and Mathematical Formulation

The RSF is typically defined as any function proportional to the probability of selection of a spatial resource unit [20]. In its most common parametric form, a RSF is an exponential function:

w(x) = exp(Î²â‚xâ‚ + Î²â‚‚xâ‚‚ + Â·Â·Â· + Î²â‚–xâ‚–) [19]

where x = {xâ‚, ..., xâ‚–} represents a vector of k environmental predictor variables, and Î² = {Î²â‚, ..., Î²â‚–} are the selection coefficients representing the strength and direction of selection for each covariate [19]. The exponential form ensures the RSF remains non-negative, representing a relative probability of use rather than an absolute probability.

In practice, RSF coefficients are commonly estimated using logistic regression within a use-availability framework [22] [19]. For a total of n resource units, the response variable y = {yâ‚,...,yâ‚™} consists of binary random variables where yáµ¢ = 1 indicates a used unit and yáµ¢ = 0 indicates an available unit. The probability that resource unit i is used given its environmental covariates xáµ¢ is modeled as:

Pr(yáµ¢ = 1|xáµ¢) = exp(Î²â‚xâ‚áµ¢ + Î²â‚‚xâ‚‚áµ¢ + Â·Â·Â· + Î²â‚–xâ‚–áµ¢) / [1 + exp(Î²â‚xâ‚áµ¢ + Î²â‚‚xâ‚‚áµ¢ + Â·Â·Â· + Î²â‚–xâ‚–áµ¢)] [19]

An alternative formulation represents RSFs as inhomogeneous Poisson point processes (IPPs) in geographic space, modeling the density of animal locations as a function of spatial predictors [19]. The intensity function Î»(s) takes a similar exponential form:

Î»(s) = exp(Î²â‚€ + Î²â‚xâ‚(s) + Î²â‚‚xâ‚‚(s) + ... + Î²â‚–xâ‚–(s))

where s represents a location in geographical space, xâ‚(s), ..., xâ‚–(s) are habitat variables at that location, Î²â‚€ is an intercept term, and Î²â‚, ..., Î²â‚– are the selection coefficients [19]. The IPP formulation provides a rigorous connection to spatial point process theory while yielding equivalent selection coefficients to the logistic regression approach when availability samples are large [19].

Experimental Design and Protocols

Study Design Considerations

Proper experimental design is paramount for valid RSF inference. Researchers must carefully define the sampling extent (the area within which availability is measured) and sampling grain (the resolution of analysis units) based on the ecological question and species biology [20]. The sampling extent typically corresponds to the individual's home range for third-order selection studies or the population range for second-order selection [20] [19]. Temporal matching of used and available samples is equally critical, as availability may vary seasonally or diurnally [20].

Defining availability represents one of the most challenging aspects of RSF design. Availability should reflect the area accessible to an animal within the relevant temporal frame, considering movement constraints, memory, and territoriality [19]. Common approaches include using minimum convex polygons, kernel density estimates, or time-varying Brownian bridges to characterize available space [21]. For population-level inference, researchers often employ mixed-effects models with random intercepts for individual animals to account for unbalanced sampling and correlation within individuals [22].

Data Collection Protocols

Movement data collection for RSF analysis requires careful consideration of fix rate (sampling frequency), which should align with the temporal scale of the ecological process under investigation [20]. Higher fix rates (e.g., <1 hour intervals) capture fine-scale movement decisions but increase autocorrelation, while lower fix rates may miss important habitat selection events [20]. Modern GPS collars can record locations with high accuracy (6-10m error) at programmable intervals, balancing battery life against data resolution [21].

Environmental covariate data should be collected at spatial resolutions matching or exceeding the animal location data. Remote sensing platforms (e.g., Landsat, MODIS) provide extensive spatial coverage for variables like vegetation indices, while LiDAR and aerial photography offer fine-scale terrain information [20]. Field measurements may be necessary for ground-truthed variables like food resource availability or precise vegetation composition [21]. All environmental variables should be checked for collinearity prior to analysis, with highly correlated predictors (|r| > 0.7) removed or combined [22].

Table 2: Data Requirements for RSF Analysis

Data Type	Description	Collection Methods	Considerations
Animal Locations	GPS coordinates of animal positions	GPS collars, VHF telemetry	Fix rate, accuracy, temporal coverage
Environmental Covariates	Habitat variables influencing selection	Remote sensing, field sampling	Resolution, temporal alignment with tracking data
Availability Samples	Random points within accessible area	GIS-based random sampling	Definition of availability domain, sample size ratio
Individual Metadata	Animal attributes (sex, age, etc.)	Field measurements, observation	Potential random effects in models

Statistical Implementation Protocol

The following step-by-step protocol outlines RSF implementation using R, the most common platform for ecological modeling:

Step 1: Data Preparation and Exploration

Import and clean animal location data, addressing any outliers or obvious errors
Extract environmental covariate values at used locations using GIS tools or R packages like raster or terra
Generate available points using appropriate sampling design (typically 10-30 random points per used point within the availability domain) [21]
Combine used and available data, creating a binary response variable (1 for used, 0 for available)
Standardize continuous covariates to mean = 0 and SD = 1 to improve model convergence and coefficient interpretability [22]

Step 2: Model Formulation and Selection

Develop a priori candidate models based on ecological knowledge and hypotheses
For population-level inference with individual variation, use mixed-effects logistic regression with the lme4 package: glmer(use ~ covariate1 + covariate2 + (1|animal_id), data = data, family = binomial(link = "logit")) [22]
Compare candidate models using Akaike's Information Criterion (AIC) or similar information-theoretic approaches [22] [21]
Select the most parsimonious model balancing fit and complexity

Step 3: Model Validation and Prediction

Validate models using k-fold cross-validation (typically 5-fold), calculating Spearman rank correlation between RSF scores and area-adjusted frequency of use [21]
Generate spatial predictions of relative probability of use across the study area
Create prediction maps visualizing habitat selection patterns
Classify predictions into discrete bins (e.g., 5 classes of equal area) for management applications [22]

Advanced Applications and Integration with Movement Ecology

Comparison with Step-Selection Functions

Step-Selection Functions (SSFs) extend RSF methodology by explicitly incorporating movement dynamics into habitat selection analysis [20]. While RSFs typically consider habitat availability across an animal's home range, SSFs condition availability on the animal's previous location and movement capabilities, comparing each observed step (the linear segment between consecutive locations) with random steps drawn from distributions of step lengths and turning angles [20]. This approach better accounts for temporal autocorrelation and movement constraints in high-frequency tracking data [20].

SSFs are particularly valuable for studying fine-scale habitat selection during movement phases, identifying movement corridors, and understanding how animals respond to linear features like roads or rivers [20]. The SSF takes a similar exponential form to the RSF but conditions selection on the starting point: w(x|uâ‚™â‚‹â‚) = exp(Î²x(uâ‚™)), where uâ‚™ represents the step and availability is defined conditional on the previous location uâ‚™â‚‹â‚ [20]. Integrated Step-Selection Functions (iSSFs) further extend this framework by simultaneously modeling movement parameters and habitat selection [19].

Integration with Behavioral State Modeling

Recent advances integrate RSFs with state-space models and hidden Markov models (HMMs) to account for behavioral heterogeneity in habitat selection [20] [19]. These approaches recognize that animals may select habitats differently depending on their behavioral state (e.g., foraging, resting, migrating). By first classifying locations into behavioral states, researchers can estimate state-specific RSFs that provide more mechanistic understanding of habitat selection drivers [19].

For example, a study on ringed seals demonstrated that HMMs could reveal variable associations with prey diversity across different behaviors, with positive relationships detected only during slow-movement behavioral states [19]. This state-dependent approach often identifies different "important" areas compared to traditional RSFs, highlighting the value of incorporating behavioral context into habitat selection analyses [19].

For social species, a novel contact-RSF framework has been developed to distinguish landscape factors driving contact locations from those driving general space use [21]. This approach tests whether contacts occur randomly with respect to habitat selection or are concentrated in specific landscape features. The contact-RSF defines contact locations as "used" points and non-contact locations within home range overlaps as "available," using logistic regression to identify habitat characteristics associated with contact probability [21].

A wild pig case study demonstrated that landscape predictors (wetlands, linear features, food resources) played different roles in habitat selection versus contact processes, challenging the assumption that contact hotspots simply mirror habitat selection patterns [21]. This specialized RSF application has important implications for understanding disease transmission dynamics, social interactions, and predator-prey encounters across landscapes [21].

The Researcher's Toolkit

Table 3: Essential Research Reagents and Computational Tools for RSF Analysis

Tool/Resource	Type	Function	Implementation
GPS Telemetry Equipment	Hardware	Collect animal movement data	GPS collars, satellite tags
GIS Software	Software	Spatial data management and analysis	ArcGIS, QGIS, R spatial packages
R Statistical Environment	Software	Statistical modeling and analysis	R Core Team
amt Package	R Package	Animal movement tracking and analysis	signac, amt
lme4 Package	R Package	Mixed-effects modeling	glmer() function
Remote Sensing Data	Data	Environmental covariate layers	Landsat, MODIS, LiDAR
AIC Model Selection	Analytical Framework	Model comparison and selection	AICcmodavg package

Resource Selection Functions provide a powerful statistical framework for quantifying animal-environment relationships across multiple spatial and temporal scales. When properly implemented with careful consideration of availability definition, sampling design, and model assumptions, RSFs yield robust insights into habitat selection patterns essential for ecological understanding and conservation application. The ongoing integration of RSFs with movement models (SSFs) and behavioral state models (HMMs) represents an exciting frontier in movement ecology, promising more mechanistic understanding of how animals perceive and respond to their environment across different behavioral contexts and spatial scales.

Step-Selection Functions (SSFs) are powerful statistical tools in movement ecology that integrate data on an animal's movement mechanics with its habitat selection preferences [4]. They model the probability of an animal selecting a subsequent location based on both the dynamic availability of locations given its previous movement and the environmental characteristics of those locations [23]. This method represents a significant advancement over traditional Resource Selection Functions (RSFs) by explicitly incorporating movement constraints into habitat selection analysis [4] [24]. SSFs accomplish this by comparing used steps (the actual movements between consecutive observed locations) with available steps (potential movements the animal could have made but did not) [24]. The core SSF framework can be expressed as a weighted distribution where the probability of an animal moving to a location depends on both a movement kernel and a habitat selection function [23] [24].

Fundamental Concepts and Analytical Framework

Core Mathematical Formulation

The SSF framework models the probability of finding an individual at location (s{t+1}) given its past positions (st) and (s_{t-1}) using the following relationship [23]:

[ u(s{t+1}) = \frac{\phi(s{t+1}, st, s{t-1}; \gamma)w(x(s{t+1}); \beta)}{\int{s \in G}\phi(s{t+1}, s{t}, s{t-1}; \gamma)w(x(s{t+1}); \beta)ds} ]

Where:

(\phi) represents the animal's movement kernel, typically defined by step-length and turning-angle distributions with parameters (\gamma)
(w) is the habitat-selection function reflecting the animal's preferences (\beta) for environmental characteristics (x) at location (s_{t+1})
The denominator ensures proper normalization of the probability distribution [23]

In most applications, the habitat-selection function (w) is modeled as a log-linear function: (w = \exp(x^\top \beta)) [23].

Integrated Step-Selection Analysis (iSSA)

Integrated Step-Selection Analysis (iSSA) extends the basic SSF framework by jointly estimating parameters for both movement ((\gamma)) and habitat selection ((\beta)) [23]. This integrated approach enables researchers to:

Model how animals respond to environmental heterogeneity while accounting for movement constraints
Update initial step-length and turning-angle distributions based on estimated coefficients
Develop fully mechanistic movement models that can simulate space use under novel conditions [23]
Quantify landscape resistance and identify movement corridors [23]

Table 1: Key Components of Step-Selection Analyses

Component	Description	Typical Implementation
Movement Kernel ((\phi))	Probability distribution of movement directions and distances	Parametric distributions (log-normal, gamma, Rayleigh) for step lengths; uniform or von Mises for turning angles [24]
Selection Function ((w))	Habitat preference function	Exponential form: (w = \exp(x^\top \beta)) [23]
Available Steps	Control steps representing potential movement choices	Random steps generated from movement distributions [24]
Estimation Method	Statistical fitting procedure	Conditional logistic regression comparing used vs. available steps [23]

Addressing Temporal Irregularity in Movement Data

The Challenge of Missing Data

A fundamental assumption of traditional iSSAs is that animal location data are collected at a constant sampling frequency, producing regular step durations [23]. However, real-world datasets frequently contain missing locations due to device limitations, with one comprehensive study reporting an average success rate of only 78% for obtaining scheduled animal locations [23]. This missingness introduces temporal irregularity that complicates analysis.

The conventional approach of using only "bursts" of regular data (sequences of locations equally spaced in time) results in substantial data loss [23]. As shown in Figure 1, a single missing location can reduce the effective sample size by three steps (the step before the gap, the step after the gap, and the turning angle at the subsequent location). With 25% missingness, the number of valid steps decreases by approximately 58% [23].

Methodological Approaches for Irregular Data

Several methodological approaches have been developed to address temporal irregularity resulting from missing data:

Imputation Approach: Fit a continuous-time correlated random walk movement model to the collected data and use the fitted model to impute missing locations [23]. This approach reconstructs regular trajectories for analysis using traditional techniques and is implemented in the R package crawl [23].
NaÃ¯ve Approach with Duration Scaling: Generate random steps by sampling step durations, step speeds, and turning angles, assuming step lengths scale linearly with step duration [23]. This method scales generated random steps by the observed step duration.
Dynamic Model with Duration-Specific Distributions: Fit separate movement distributions to steps of different durations, acknowledging potentially non-linear relationships between step duration and movement parameters [23].
Ecological Diffusion Equation (EDE) Framework: Utilize continuous-time availability distributions derived from ecological diffusion principles, including a Rayleigh step-length distribution and uniform turning angle distribution that naturally accommodate irregular time intervals [24].

Table 2: Comparison of Methods for Handling Temporally Irregular Data

Method	Key Principle	Advantages	Limitations
Bursts of Regular Data	Use only sequences with regular step durations	Simple implementation; maintains standard assumptions	Substantial data loss; reduced statistical power [23]
Imputation	Reconstruct missing locations using movement models	Maximizes data utilization; produces regular trajectories	Introduces model dependence; potential imputation bias [23]
Duration Scaling	Scale movement parameters by step duration	Accommodates varying intervals; relatively simple	Assumes linear scaling; may not hold for longer intervals [23]
EDE Framework	Use continuous-time distributions from diffusion theory	Mechanistically grounded; handles irregular intervals naturally	Less familiar to practitioners; requires specialized implementation [24]

Experimental Protocols for SSF Implementation

Data Preparation Workflow

Diagram 1: SSF Data Preparation Workflow

Protocol for Basic Integrated Step-Selection Analysis

Step 1: Data Collection and Cleaning

Collect animal location data at consistent time intervals using GPS loggers or other telemetry devices [25]
Record environmental covariates representing resources, risks, or other relevant landscape features
Clean data to remove obvious errors and ensure proper formatting

Step 2: Address Temporal Irregularity

Assess data for missing locations and irregular time intervals
Choose appropriate method for handling irregularity based on data characteristics and research questions (refer to Table 2)
For the EDE approach, pre-estimate homogenized motility coefficient using temporal moving average [24]:

[ \bar{\delta}(ti) \approx \sum{tj \sim ti} \frac{(\mathbf{s}(tj)-\mathbf{s}(t{j-1}))'(\mathbf{s}(tj)-\mathbf{s}(t{j-1}))}{4ni\Delta tj} ]

Step 3: Generate Available Steps

For each observed step, generate multiple random available steps from appropriate distributions
For regular data, use parametric distributions (e.g., log-normal for step lengths, von Mises for turning angles) [24]
For irregular data using EDE framework, employ Rayleigh step-length distribution and uniform turning angle distribution [24]

Step 4: Extract Covariate Values

For each observed and available step endpoint, extract values of relevant environmental covariates
Include movement descriptors (step length, log(step length), cosine of turning angle) to jointly estimate movement parameters [23]

Step 5: Model Fitting

Fit conditional logistic regression model comparing used vs. available steps
Include habitat covariates to estimate selection coefficients ((\beta))
Include movement descriptors to update tentative movement distributions ((\gamma)) [23]

Step 6: Model Validation and Interpretation

Validate model using cross-validation or goodness-of-fit assessments
Interpret selection coefficients as relative selection strength (exponential form) or absolute probability of use (logistic form) [24]

Table 3: Key Research Tools and Software for Step-Selection Analyses

Tool/Software	Primary Function	Application Context
amt R package	SSF and iSSA implementation	General step-selection analyses; track management; burst identification [23]
crawl R package	Continuous-time movement modeling	Location imputation for missing data [23]
momentuHMM R package	Hidden Markov Models	Behavioral state-specific habitat selection [4]
GPS Loggers	Animal location data collection	Fine-scale movement data acquisition (e.g., i-got U GT-600) [25]
GIS Software	Environmental covariate processing	Spatial data management; raster and distance calculations [25]
Acoustic Telemetry	Underwater movement tracking	Freshwater fish movement near barriers [26]

Advanced Applications and Methodological Extensions

Behavioral State Integration

Advanced SSF implementations can incorporate behavioral states using Hidden Markov Models (HMMs) or related methods [4] [26]. This approach recognizes that animals may exhibit different movement patterns and habitat selection preferences depending on their behavioral mode (e.g., foraging, resting, migrating). A study on freshwater fish demonstrated that combining HMMs with SSFs enables analysis of behavioral-state specific habitat selection, though individual variation may be high [26].

Cross-Disciplinary Applications

While developed in wildlife ecology, SSFs have proven adaptable to other fields including infectious disease epidemiology [25]. A study on leptospirosis transmission in urban slums used SSFs to analyze how human movement patterns influence exposure to environmental risk factors, revealing gender-based differences in interactions with contaminated waterways [25]. This demonstrates the methodological transferability of SSFs to human mobility research.

Individual Variability Modeling

Recent methodological advances enable quantification of variability among animals in their space-use patterns through the incorporation of random effects in iSSA [27]. While applications have primarily focused on habitat selection parameters, there is growing recognition of the importance of modeling individual variability in movement parameters, which plays a crucial role in ecological processes across organizational levels [27].

In movement ecology, statistical methods are indispensable for transforming raw tracking data into meaningful biological insights. Hidden Markov Models (HMMs) have emerged as a powerful framework for this purpose, capable of segmenting continuous movement trajectories into discrete, latent (unobserved) behavioral states. The core premise of HMMs is that an animal's observed movement patterns (e.g., step lengths and turning angles) are generated by its underlying, unobserved behavioral state, such as resting, foraging, or traveling. These models assume the system evolves as a Markov process, where the next behavioral state depends only on the current state, providing a robust structure for inferring behavioral dynamics from serial correlation in movement data. This document, framed within a broader thesis on movement ecology statistical methods, provides detailed application notes and protocols for employing HMMs in behavioral state classification, featuring validated methodologies from recent research.

Fundamental Concepts and Model Specification

An HMM is defined by two interconnected stochastic processes: a latent state sequence and an observation sequence. In movement ecology, the latent states are the behaviors, and the observations are the movement metrics derived from tracking data.

Core Components of an HMM for Movement Ecology

Observation Model: This model defines the probability of observing a particular movement metric (e.g., a short step length) given that the animal is in a specific behavioral state (e.g., foraging). The observation model typically uses probability distributions like the gamma distribution for step lengths (which are non-negative) and the von Mises distribution for turning angles (which are circular).
Transition Probability Matrix: This matrix defines the dynamics between behavioral states. Each element, Î³_{ij}, represents the probability of transitioning from state i at time t to state j at time t+1. This matrix captures the persistence and switching patterns of behavior.
Initial State Distribution: This defines the probability of starting in each behavioral state at the beginning of the observation sequence.

The Canonical Problems and Solutions

Working with HMMs involves addressing three canonical problems, for which efficient algorithms exist:

Evaluation: Computing the probability of the observed sequence given the model parameters (solved by the Forward-Backward algorithm).
Decoding: Determining the most likely sequence of hidden states that produced the observed data (solved by the Viterbi algorithm).
Learning: Estimating the most likely model parameters (the observation and transition distributions) from the observed data (achieved using the Baum-Welch algorithm, an expectation-maximization algorithm).

Quantitative Data and Behavioral State Definitions

The following tables synthesize quantitative findings on behavioral states identified by HMMs across various species, as revealed in the search results.

Table 1: Summary of Behavioral States Identified via HMMs in Different Species

Species	Identified Behavioral States	Key Movement Metrics (State-Dependent Distributions)	Citation
Mouse (Mus musculus)	Resting, Exploring, Navigating	Step length and turning angle modulations in response to visual depth cues.	[28]
Red-billed Tropicbird (Phaethon aethereus)	Resting, Foraging, Travelling	Step length, turning angle; foraging state wasæœ€éš¾äºŽåŒºåˆ† (low sensitivity/precision).	[29]
Eurasian Wild Boar (Sus scrofa)	Resting, Foraging, Travelling	Step length and turning angle; behaviors showed varying spatial expansiveness.	[30]
Macaque & Mouse (Comparative)	Internal States (e.g., attentive)	Inferred from facial features; states predicted reaction times and task outcomes.	[31]

Table 2: HMM Specifications and Performance Metrics from Literature

Study Focus	HMM Variant / Key Feature	Software/Tool Used	Reported Performance / Validation	Citation
Mouse Visual Cognition	Standard HMM on circular apparatus data	DeepLabCut for tracking	Distinguished visually-guided behavior from general exploration.	[28]
Seabird Foraging Ecology	Semi-supervised HMM	`momentuHMM` R package	Accuracy improved from 0.77 Â± 0.01 to 0.85 Â± 0.01 with 9% supervised data.	[29]
Wild Boar Movement	Autoregressive HMM (AR-HMM)	Python libraries (e.g., `smm`)	Incorporated movement history into observation process.	[30]
Cross-species Internal States	Markov-Switching Linear Regression (MSLR)	Custom software package	Identified states that reliably predicted reaction times and task outcomes.	[31]

Experimental Protocol: Applying HMMs to Animal Tracking Data

This protocol outlines the steps for implementing an HMM to classify behavioral states from GPS tracking data, incorporating insights from recent studies.

Data Collection and Preprocessing

GPS Tracking: Collect high-frequency GPS location data. The studies cited used intervals ranging from 5 minutes for seabirds to higher frequencies for wild boars [29] [30].
Data Cleaning: Remove obvious GPS fix errors and implausible movements based on speed thresholds.
Derive Movement Metrics:
- Step Length (SL): Calculate the Euclidean distance between consecutive GPS fixes. ( SLt = \sqrt{(xt - x{t-1})^2 + (yt - y{t-1})^2} )
- Turning Angle (TA): Calculate the relative angle between consecutive steps. ( TAt = \arctan((yt - y{t-1})/(xt - x{t-1})) - \arctan((y{t-1} - y{t-2})/(x{t-1} - x{t-2})) )
Consider Data Transformation: Log-transform step lengths to reduce skewness, which can help with model fitting.

Model Fitting and State Decoding

Initialization: Provide initial guesses for the parameters of the state-dependent distributions (e.g., mean and standard deviation for step length) and the transition probability matrix. This is often the most sensitive step, and multiple random initializations should be tried to avoid local maxima.
Model Fitting: Use the Baum-Welch algorithm to estimate the model parameters that maximize the likelihood of the observed data. This is handled internally by packages like moveHMM or momentuHMM.
State Decoding: Use the Viterbi algorithm to find the most likely sequence of hidden behavioral states given the fitted model and the observed data.
Model Selection: Use information criteria like AIC or BIC to select the optimal number of behavioral states. The biological plausibility of the decoded states must also be considered.

Advanced Workflow: Semi-Supervised HMM with Auxiliary Data

For species with subtle behavioral distinctions (e.g., "foraging on the go" in homogenous environments), a semi-supervised approach significantly improves accuracy [29]. The workflow integrates auxiliary sensor data to guide the HMM.

Protocol for Semi-Supervised HMM:

Collect Auxiliary Data: Deploy a subset of tags with sensors like accelerometers (for fine-scale movement), wet-dry sensors (for immersion), or Time-Depth Recorders (for diving) alongside GPS [29].
Label Behaviors from Auxiliary Data: Use sensor-specific thresholds (e.g., high variance in accelerometer data indicates activity; dry sensor indicates resting for seabirds) to assign known behavioral states to a subset of GPS fixes.
Incorporate Labels into HMM: During model fitting, "fix" the state sequence for the known-data points. This directly informs the model about the movement metrics associated with confirmed behaviors, dramatically improving the classification of the remaining, unlabeled data [29].

This table details key software, data, and analytical tools required for implementing HMMs in movement ecology research.

Table 3: Research Reagent Solutions for HMM-Based Movement Analysis

Item Name / Category	Specifications / Function	Example Use in Protocol
GPS Tracking Loggers	High-frequency, GPS-GSM or GPS-UHF collars/tags; small and lightweight for species.	Primary data source for animal locations. Essential for calculating step length and turning angle. [29] [30]
Auxiliary Sensors	Tri-axial accelerometers, wet-dry sensors, Time-Depth Recorders (TDR).	Provides ground-truth data for semi-supervised learning. Validates and improves HMM classifications. [29]
DeepLabCut	Deep learning-based software for markerless pose estimation from video.	Tracked mouse body parts in a circular visual cliff apparatus to generate high-precision movement data for HMM input. [28]
R Package `momentuHMM`	Comprehensive R package for fitting complex HMMs to animal movement data.	Handles data preprocessing, model fitting, state decoding, and visualization. Supports semi-supervision. [29]
R Package `moveHMM`	User-friendly R package for fitting basic HMMs to animal track data.	Suitable for introductory HMM analysis with step length and turning angle. [30]
Python Library `smm`	Python library for fitting various stochastic models, including HMMs.	Used in wild boar study to implement an Autoregressive HMM (AR-HMM). [30]
Markov-Switching Linear Regression (MSLR)	A specialized HMM variant where the observation model is a linear regression.	Used to infer internal states of mice and monkeys from facial features, predicting reaction times. [31]

Hidden Markov Models provide a statistically rigorous and flexible framework for uncovering the latent behavioral structure in animal movement trajectories. The integration of semi-supervised learning techniques, leveraging auxiliary sensor data, represents a significant advancement, enabling robust behavioral classification even in challenging ecological contexts. Furthermore, the development of specialized variants like Autoregressive HMMs and Markov-Switching Linear Regression expands the applicability of these methods to more complex data structures and research questions. As a core component of the movement ecology statistical toolkit, HMMs empower researchers to move beyond simple trajectory description to a deeper, mechanistic understanding of animal behavior and its drivers.

Understanding and predicting how animals move through fragmented landscapes is a central challenge in movement ecology and conservation biology [32] [4]. A key task is identifying dispersal routes and wildlife corridors, which typically relies on quantifying the resistance or permeability of a landscape [32]. However, a significant gap has existed between raw movement data and connectivity analysis, often necessitating arbitrary transformations of habitat suitability into resistance values [32]. The Time-Explicit Habitat Selection (TEHS) model is a novel analytical framework designed to bridge this gap by decomposing the movement process into two fundamental, quantifiable components: a time component and a habitat selection component [32]. This protocol details the application of the TEHS model, using the foundational case study of giant anteaters in the Pantanal wetlands to provide a clear, reproducible methodology for researchers [32] [33].

Model Foundations and Mathematical Framework

The TEHS model is grounded in the principle that movement decisions can be separated into where an animal chooses to go, and how long it takes to get there. These components provide complementary information on space use [32].

Core Model Equation

The model probabilistically describes the movement from a starting pixel (i) to a subsequent pixel (j) over a time interval (\Delta t). Using Bayes' theorem, the permeability matrix, which is central to connectivity analysis, is formulated as:

[ p\left( {P{t + \Delta t} = j|\Delta t,P{t} = i} \right) = \frac{{p\left( {\Delta t|P{t + \Delta t} = j,P{t} = i} \right)p\left( {P{t + \Delta t} = j|P{t} = i} \right)}}{{\mathop \sum \nolimits{k = 1}^{N} p\left( {\Delta t|P{t + \Delta t} = k,P{t} = i} \right)p\left( {P{t + \Delta t} = k|P_{t} = i} \right)}} ]

Where:

Time Component (p\left( {\Delta t|P{t + \Delta t} = j,P{t} = i} \right)): The likelihood that time (\Delta t) is required for this move. It quantifies how landscape features speed up or slow down movement.
Selection Component (p\left( {P{t + \Delta t} = j|P{t} = i} \right)): The strength of habitat selection for pixel (j) given a start in pixel (i), independent of time. It reflects the intrinsic preference for or avoidance of a habitat [32].

Conceptual Framework: Linking Time and Selection

The decomposition into time and selection allows researchers to infer the potential motivation behind an animal's interaction with a landscape feature. The following conceptual diagram illustrates how these two axes interact to define habitat types.

Diagram Title: TEHS Model Conceptual Framework

This framework posits that a habitat type can be one of four types, defined by the combination of selection strength and movement speed [32]. This provides critical ecological insight beyond a single resistance value.

Application Protocol: Giant Anteater Case Study

This section provides a step-by-step protocol for applying the TEHS model, based on the study of giant anteaters (Myrmecophaga tridactyla) in the Pantanal wetlands of Brazil [32] [33].

Research Reagent Solutions and Essential Materials

Table 1: Essential Materials and Analytical Tools for TEHS Modeling

Item Category	Specific Example / Function	Purpose in TEHS Workflow
Data Collection	GPS biologging devices	To collect high-resolution, timestamped location data from study animals.
Environmental Data	GIS raster layers (e.g., land cover, vegetation indices, temperature)	To characterize habitat covariates for each location in the landscape.
Statistical Software	R programming environment with specialized packages (e.g., `amt`) [4]	For data management, statistical fitting of model components, and visualization.
Connectivity Framework	Spatial Absorbing Markov Chain (SAMC) framework [32]	To integrate TEHS parameters and simulate movement/connectivity in fragmented landscapes.

Step-by-Step Experimental and Analytical Methodology

Step 1: Data Preparation and Processing

Animal Movement Data: Collect GPS tracking data at a temporal resolution appropriate for the species and research question. For the anteater study, data was used to extract steps (consecutive locations) and calculate derived movement metrics like speed and turning angles [32] [12].
Environmental Covariates: Process spatial layers to extract values at each animal location and, crucially, at a set of available or control locations that the animal could have used but did not. Common covariates include land cover type, topographic features, and climatic data [32] [4].

Step 2: Model Specification and Fitting The two model components are fitted separately, often using conditional logistic regression within a used-available framework [4].

Fitting the Selection Component: The selection function (p(\text{destination} | \text{origin})) is modeled by comparing the habitat covariates at the used destination (j) to those at random available destinations generated from a movement kernel around the origin (i) [32].
Fitting the Time Component: The time likelihood (p(\Delta t | \text{origin, destination})) is modeled by relating the observed step duration (\Delta t) to the environmental conditions along the step. This can be achieved using generalized linear models with a Gamma distribution [32].

Step 3: Parameter Estimation and Interpretation The analysis of giant anteaters yielded the following quantitative results, which can be summarized in a table for clear comparison.

Table 2: Example TEHS Model Results from Giant Anteater Study [32] [33]

Model Component	Habitat Covariate	Parameter Influence	Ecological Interpretation
Time Model	Wetlands	Negative coefficient (Faster movement)	Wetlands act as corridors for faster transit.
	Forest & Savanna	Positive coefficient (Slower movement)	Complex terrain or resource use slows movement.
	Nocturnal Period (8pm-5am)	Negative coefficient (Faster movement)	Crepuscular/nocturnal behavior facilitates movement.
Selection Model	Wetlands	Negative coefficient (Avoidance)	Wetlands are generally avoided as suboptimal habitat.
	Forest & Savanna	Positive coefficient (Selection)	These are selected, likely for resources or shelter.
	Forest Ã— Temperature	Positive interaction (Stronger selection)	Forests are selected as thermal shelter at high temperatures.

Step 4: Connectivity Analysis using the Spatial Absorbing Markov Chain (SAMC)

Integration: The estimated probabilities from the TEHS model are used to populate the permeability matrix (Q) of the SAMC framework [32].
Simulation: The SAMC uses this matrix, along with a defined initial distribution of animals and potential mortality risks (R), to simulate movement paths and compute time-explicit connectivity metrics [32].
Outcome: In the anteater study, this revealed that animals often do not take the shortest-distance path between habitat patches, instead detouring to avoid non-preferred habitats like wetlands, a critical insight for corridor planning [32] [33].

The TEHS model provides a powerful, principled framework that directly links animal movement data to landscape connectivity. By explicitly decomposing movement into time and habitat selection components, it avoids arbitrary resistance transformations and offers deeper ecological insight into how animals perceive and interact with their landscape. The integration with the SAMC framework allows for the generation of time-explicit connectivity maps, providing robust, model-based tools for conservation planning and the identification of functional wildlife corridors [32].

In movement ecology, a significant gap often exists between statistical model output and practical conservation application. While statistical models like Step Selection Functions (SSFs) can quantify species-environment relationships, translating these complex results into actionable insights for landscape planning remains a challenge [4]. Connectivity analysis, which identifies crucial wildlife corridors and dispersal routes, typically relies on resistance surfaces that are often derived from arbitrary transformations of habitat suitability [32]. This protocol addresses this methodological gap by presenting a structured framework for using movement models to directly parameterize connectivity analysis, moving beyond correlation to mechanistic understanding.

The following sections provide application notes and detailed protocols for implementing the Time-Explicit Habitat Selection (TEHS) model and connecting it to connectivity analysis using the Spatial Absorbing Markov Chain (SAMC) framework [32]. This approach decomposes movement into time and selection components, providing complementary information for interpreting animal space use and generating time-explicit connectivity results.

Theoretical Foundation: Decomposing Movement into Time and Selection Components

Animal movement patterns result from distinct behavioral processes that can be characterized along two primary axes: habitat selection and time to traverse the landscape [32]. The conceptual framework in the table below illustrates how these axes interact to create different functional habitat types.

Table 1: Conceptual Framework for Interpreting Movement Patterns Based on Selection Strength and Time to Traverse

Selection Strength	Time to Traverse (Fast)	Time to Traverse (Slow)
Selected	Displacement Habitat: Used for directed movement (e.g., migratory corridors) [32]	Resource Use Habitat: Used for activities requiring longer residence (e.g., foraging, shelter) [32]
Avoided	Permeable-Risky Habitat: Crossed quickly due to perceived risk [32]	Resistant-Risky Habitat: Creates movement barriers due to physical resistance and risk [32]

This decomposition is formally expressed in the Time-Explicit Habitat Selection (TEHS) model through a probabilistic framework based on Bayes' theorem [32]:

$$p\left( {P{t + \Delta t} = j|\Delta t,P{t} = i} \right) = \frac{{p\left( {\Delta t|P{t + \Delta t} = j,P{t} = i} \right)p\left( {P{t + \Delta t} = j|P{t} = i} \right)}}{{\mathop \sum \nolimits{k = 1}^{N} p\left( {\Delta t|P{t + \Delta t} = k,P{t} = i} \right)p\left( {P{t + \Delta t} = k|P_{t} = i} \right)}}$$

Where:

$p\left( {P{t + \Delta t} = j|\Delta t,P{t} = i} \right)$ represents the probability of selecting pixel $j$ from pixel $i$ given time constraint $\Delta t$
$p\left( {\Delta t|P{t + \Delta t} = j,P{t} = i} \right)$ represents the time component (likelihood of requiring $\Delta t$ to move from $i$ to $j$)
$p\left( {P{t + \Delta t} = j|P{t} = i} \right)$ represents the selection component (strength of selection for pixel $j$ regardless of time)

The following diagram illustrates the complete analytical workflow from movement data collection to connectivity mapping:

Experimental Protocols

Protocol 1: TEHS Model Implementation

Purpose: To decompose movement patterns into time and selection components using the TEHS model framework.

Materials and Software Requirements:

GPS tracking data (regular time intervals)
Environmental raster layers (habitat classification, topography, etc.)
R statistical environment
TEHS model implementation code [32]

Procedure:

Data Preparation:
- Format movement data as a track with regular time steps
- Extract environmental covariates at each observed location
- Generate available points using a movement model (e.g., correlated random walk)
Model Specification:
- Define the selection component formula based on hypotheses
- Define the time component formula based on movement constraints
- Specify the perceptual range parameter based on species ecology
Parameter Estimation:
- Fit the TEHS model using maximum likelihood estimation
- Calculate the permeability matrix $Q$ using Equation 1
- Validate model fit using k-fold cross-validation
Interpretation:
- Compare coefficient estimates for selection and time components
- Classify habitats according to the conceptual framework (Table 1)
- Generate spatial predictions of selection and movement speed

Troubleshooting Tips:

If model convergence fails, simplify the model structure
Check for collinearity among environmental predictors
Ensure the perceptual range parameter is biologically realistic

Protocol 2: Connectivity Analysis Using Spatial Absorbing Markov Chains

Purpose: To translate TEHS model output into connectivity predictions using the SAMC framework.

Materials and Software Requirements:

Permeability matrix $Q$ from Protocol 1
Landscape raster data
R package for SAMC implementation [32]
GIS software for visualization

Procedure:

Framework Setup:
- Define the landscape as a discrete grid of cells
- Initialize the permeability matrix $Q$ with TEHS model results
- Specify the mortality/absorption matrix $R$ if relevant
- Define the initial distribution $\Psi$ based on study design
Connectivity Metrics Calculation:
- Compute the fundamental matrix $N = (I - Q)^{-1}$
- Calculate occupancy time expectations
- Compute first-passage time distributions
- Generate between-node connectivity metrics
Time-Explicit Analysis:
- Specify time intervals of interest (e.g., daily, seasonal)
- Compute time-explicit occupancy probabilities
- Calculate short-term vs. long-term connectivity patterns
Visualization and Application:
- Map connectivity corridors using circuit theory or least-cost path analogs
- Identify potential barrier locations
- Prioritize areas for conservation interventions

Analytical Notes:

The SAMC framework provides both time-explicit and long-term analytical solutions
Connectivity results can be sensitive to the initial distribution $\Psi$
Validation using independent movement data is recommended when possible

The Scientist's Toolkit: Essential Research Solutions

Table 2: Key Research Reagents and Computational Tools for Movement-to-Connectivity Analysis

Tool/Resource	Function/Purpose	Implementation Notes
GPS Telemetry Devices	Collection of high-resolution movement data	Select appropriate fix rate for ecological questions [4]
Environmental Rasters	Characterization of habitat features	Resolution should match movement scale [5]
R Statistical Environment	Core platform for analysis	Use `amt`, `momentuHMM`, or custom TEHS code [4]
Step Selection Functions (SSF)	Quantifying habitat selection	Accounts for movement constraints when estimating selection [4] [32]
Time-Explicit Habitat Selection (TEHS) Model	Decomposing movement into time and selection components	Novel approach that avoids arbitrary resistance transformations [32]
Spatial Absorbing Markov Chain (SAMC)	Connectivity analysis framework	Generates time-explicit connectivity results [32]
EcoNicheS Platform	Integrated modeling workflow	Shiny-based interface for ecological niche modeling [34]

Application Case Study: Giant Anteaters in the Pantanal

To illustrate the practical application of this protocol, we summarize a case study on giant anteaters (Myrmecophaga tridactyla) in the Pantanal wetlands of Brazil [32]:

Table 3: TEHS Model Results for Giant Anteater Movement and Habitat Selection

Habitat Type	Selection Coefficient	Time Coefficient	Ecological Interpretation
Wetlands	Avoided (Negative)	Faster Movement (Negative)	Permeable but risky habitat [32]
Forest	Selected (Positive)	Slower Movement (Positive)	Resource exploration habitat [32]
Savanna	Selected (Positive)	Slower Movement (Positive)	Resource exploration habitat [32]
Temperature Interaction	Increased forest selection with higher temperature	Not significant	Forests act as thermal shelters [32]

The connectivity analysis revealed that giant anteaters often do not use the shortest-distance path between habitat patches due to avoidance of wetlands, demonstrating how the integration of movement behavior improves connectivity predictions [32].

Advanced Methodological Considerations

Addressing Landscape Heterogeneity in Interaction Inference

When analyzing inter-individual interactions from movement data, neglecting physical environmental features can lead to spurious interactions [5]. The following diagram illustrates this methodological challenge and solution:

The Spatial+ method can reduce bias from unmeasured spatial factors when complete environmental data is unavailable [5]. This approach removes the effect of space on social covariates before inclusion in SSFs, providing more robust inference of inter-individual interactions.

Comparison of Movement Modeling Approaches

Table 4: Comparison of Statistical Models for Characterizing Species-Habitat Associations

Model Type	Appropriate Scale	Key Advantages	Limitations
Resource Selection Function (RSF)	Population-level, home range scale [4]	Simplicity, ease of implementation [4]	Does not account for movement autocorrelation [4]
Step Selection Function (SSF)	Fine-scale, incorporating movement constraints [4]	Accounts for serial correlation in locations [4]	Requires high-temporal resolution data [4]
Hidden Markov Model (HMM)	Behaviorally-explicit analysis [4]	Identifies behavioral states and state-dependent selection [4]	Increased computational complexity [4]
Integrated SSF (iSSA)	Mechanistic movement modeling [32]	Jointly models movement and habitat selection [32]	Complex implementation [32]
TEHS Model	Connectivity-focused analysis [32]	Decomposes movement into time and selection; direct connectivity application [32]	Novel method with limited application to date [32]

This protocol has outlined a comprehensive framework for bridging the gap between movement models and connectivity analysis. By implementing the TEHS model within the SAMC framework, researchers can move beyond arbitrary resistance surfaces to mechanistic, behaviorally-informed connectivity assessment. The case study demonstrates how this approach reveals ecologically meaningful patterns that would be obscured by traditional methods.

Future methodological developments should focus on integrating population dynamics with movement-based connectivity models, incorporating individual variation in movement strategies, and extending these approaches to multi-species interactions. The continued refinement of these methods will enhance our ability to design effective conservation corridors in increasingly fragmented landscapes.

Overcoming Analytical Hurdles in Movement Ecology Studies

Addressing Data Gaps, Locational Error, and Biases in Tracking Data

The analysis of animal movement data is fundamental to understanding species-habitat associations, behavior, and conservation needs [4]. However, the path from raw tracking data to ecological insight is fraught with methodological challenges. Three pervasive issuesâ€”data gaps, locational error, and various forms of biasâ€”can significantly compromise the validity of research findings if not properly addressed. These challenges are particularly critical in movement ecology, where statistical models such as resource selection functions (RSFs), step-selection functions (SSFs), and hidden Markov models (HMMs) are widely used to infer species-habitat relationships and behavior [4] [35]. The increasing reliance on telemetry data for identifying critical habitat and informing conservation policy [4] makes it essential that researchers employ robust protocols to identify and mitigate these data quality issues. This application note provides detailed methodologies for detecting and addressing these challenges within a movement ecology research framework.

Quantifying and Classifying Data Anomalies

A Typology of Tracking Data Imperfections

Table 1: Classification of Common Data Challenges in Movement Ecology

Challenge Type	Primary Causes	Impact on Analysis	Detection Methods
Data Gaps	Tag failure, satellite coverage issues, habitat obstruction (e.g., canopy cover)	Incomplete movement paths, biased inference of space use, misrepresentation of behaviors	Time interval analysis, sequence plotting, habitat-based gap analysis
Locational Error	GPS precision limitations, habitat-induced signal degradation (e.g., forest cover, urban canyons)	Misidentification of habitat use, inflated movement parameters (step length, turning angles)	Dilution of Precision (DOP) filtering, speed-based filters, habitat-specific error assessment
Sampling Bias	Non-random tag deployment, unequal sampling across sexes/age classes, geographic biases in study sites	Unrepresentative population inferences, limited generalizability, confounding of habitat selection studies	Demographic representation analysis, geographic coverage assessment, sampling effort mapping
Model Specification Bias	Omission of relevant environmental covariates in SSFs	Spurious detection of inter-individual interactions, confounding of environmental and social effects	Covariate importance testing, residual spatial autocorrelation analysis, Spatial+ implementation [5]

Quantitative Assessment Protocols

Protocol 1: Data Gap Analysis Framework

Calculate the distribution of time intervals between consecutive locations
Flag gaps exceeding 2Ã— the nominal sampling interval as significant
Map gap locations relative to environmental features (e.g., urban areas, dense forest) to identify habitat-specific sampling problems
For HMMs, assess whether gaps coincide with behavioral state transitions by examining pre- and post-gap movement parameters [35]

Protocol 2: Locational Error Validation

Extract and plot Dilution of Precision (DOP) values when available
Implement speed-filtering algorithms to identify physiologically implausible movements
Calculate relative positioning errors across habitat types by examining positional jitter in stationary tests
For SSF analysis, document error magnitude relative to the scale of environmental covariates [5]

Experimental Protocols for Bias Mitigation

Accounting for Landscape Heterogeneity in Interaction Studies

Failure to incorporate landscape data when analyzing interactions between moving individuals generates spurious results [5]. The following protocol mitigates this bias:

Protocol 3: Landscape-Aware Interaction Analysis

Data Requirements: Simultaneous tracking data from â‰¥2 individuals, high-resolution landscape data (resources, barriers)
Method Selection: Use SSF-based approaches (SSF-OD or SSF-DIST) rather than Dynamic Interaction (DI) indices, as the latter cannot incorporate environmental covariates [5]
Model Specification:
- Include both social (e.g., distance to conspecific) and environmental (e.g., habitat quality, barriers) covariates
- Use integrated Step Selection Analysis (iSSA) to account for how habitat influences both speed and selection [32]
Spatial Bias Mitigation: Apply Spatial+ method to reduce bias from unmeasured spatial factors when complete landscape data is unavailable [5]
Validation: Compare models with and without environmental covariates; spurious social interactions will diminish when proper environmental drivers are included

Addressing Sampling Biases in Study Design

Documented biases in movement ecology include geographic disparities between author affiliations and study sites, and demographic misrepresentation of studied populations [36]. These biases limit the generalizability of findings.

Protocol 4: Bias-Aware Research Design

Geographic Representation Assessment:
- Map study sites relative to researcher institutions to identify "parachute science" patterns
- Actively involve local researchers throughout the research process
- Ensure study designs are relevant to local conservation needs
Demographic Representation:
- Document sex, age, and social status of studied individuals
- Analyze whether sampling proportions reflect population demographics
- Use stratified sampling when certain demographics are systematically underrepresented
Publication Practice Audit:
- Document author affiliations and funding sources
- Ensure appropriate credit to local contributors and knowledge sources

Visualizing Methodological Approaches

Workflow for Bias-Aware Movement Analysis

Time-Explicit Habitat Selection Framework

The Time-Explicit Habitat Selection (TEHS) model bridges movement data and connectivity analysis while separately assessing drivers of time to traverse landscapes and habitat selection [32]. This decomposition helps distinguish between different movement motivations.

Protocol 5: Implementing TEHS Analysis

Data Preparation: Regularize tracking data to constant time intervals
Model Components:
- Time component: Models time required to move between locations
- Selection component: Models habitat preference regardless of time
Analysis Implementation:
- Use Spatial Absorbing Markov Chain (SAMC) framework for connectivity analysis
- Calculate permeability matrices incorporating both time and selection
Interpretation: Classify habitats into four functional types based on selection strength and movement speed (Fig. 1)

The Scientist's Toolkit: Research Reagent Solutions

Table 2: Essential Analytical Tools for Addressing Movement Data Challenges

Tool/Platform	Primary Function	Application Context	Implementation Considerations
amt R package [4]	RSF and SSF implementation	Habitat selection studies, resource preference analysis	User-friendly but requires careful definition of "available" points
momentuHMM R package [4]	Hidden Markov Model fitting	Behavioral state identification, state-dependent habitat selection	Computationally intensive; requires adequate data for state estimation
Wildlife DI R package [5]	Dynamic Interaction indices	Quantifying social interactions from movement data	Does not account for environmental covariates; use with caution
Spatial+ method [5]	Bias reduction from unmeasured spatial factors	All movement models when complete environmental data is unavailable	Relatively new method; requires spatial regression implementation
Time-Explicit Habitat Selection (TEHS) [32]	Decomposing movement into time and selection components	Connectivity analysis, corridor identification	Links movement models with Spatial Absorbing Markov Chains
Integrated Step Selection Analysis (iSSA) [32]	Joint modeling of movement and habitat selection	Path simulation, connectivity mapping	Accounts for how habitat affects both speed and direction

Concluding Recommendations

Addressing data gaps, locational error, and biases in tracking data requires integrated approaches throughout the research lifecycle. Key recommendations include: (1) proactively collecting detailed environmental covariate data alongside tracking data; (2) applying SSF-based methods with environmental covariates rather than simple interaction indices when studying social behavior; (3) implementing the TEHS framework to decompose movement into time and selection components for connectivity analysis; and (4) conducting systematic audits of geographic and demographic representation in study designs. These protocols enable researchers to produce more robust ecological inferences from imperfect tracking data, ultimately supporting more effective conservation decisions.

In movement ecology, a Resource Selection Function (RSF) is a statistical model that relates habitat characteristics to the relative probability of use by an animal [4]. The core principle underpinning any RSF is a comparison between the environmental conditions at locations used by an animal and those that were available to it [37] [4]. Mathematically, the RSF, ( w(\mathbf{x}) ), is often defined as: $$w\left( {\mathbf{x}} \right) = {\text{exp}}\left( { \beta{1} x{1} + \beta{2} x{2} + \cdot \cdot \cdot + \beta{k} x{k} } \right)$$ where ( \mathbf{x} ) represents habitat variables and ( \beta ) are the selection coefficients [4]. However, the estimation of these coefficients is entirely contingent on how the available landscape is defined. This choice is frequently described as the most subjective and influential decision in the RSF workflow, as it directly imposes a hypothesis about the spatial and ecological constraints an animal experiences [37]. An improper definition can lead to spurious results, misrepresenting the true habitat selection process and potentially leading to flawed conservation or management actions.

Conceptual and Methodological Framework

The Hierarchical Nature of Habitat Selection

Habitat selection operates across multiple spatial scales, which Johnson (1980) formally classified into four orders [38]. The definition of availability is intrinsically linked to this hierarchy. The following table outlines the common scales at which availability is defined and their ecological interpretations.

Table 1: Hierarchical Scales for Defining Habitat Availability in RSF Studies

Selection Order	Definition of 'Available'	Typical Analytical Extent	Ecological Interpretation
First Order	Geographic range of the population	Species' global or continental range	Selection of a species' geographical distribution.
Second Order	Individual home range within population range	Annual home range (e.g., MCP, KDE)	Selection of an individual's home range from the population range.
Third Order	Local patches within a home range	Local context around relocations	Selection of habitat patches within an individual's home range.
Fourth Order	Specific resources within a patch	Immediate vicinity of a relocation	Selection of actual food items or specific resources.

For RSFs based on telemetry data, second-order (selection of a home range) and third-order (selection within a home range) are the most common frameworks [4] [38]. The choice between them fundamentally alters the biological inference. A second-order design asks, "What habitats does this animal select for its home range?" while a third-order design asks, "Given its home range, how does this animal use habitats disproportionately within it?"

Common Methodologies for Defining Availability

The two predominant paradigms for defining availability in RSF studies are the use-availability design and the inhabited vs. uninhabited range approach.

Use-Availability Design: This is the most common approach for telemetry data. It compares used locations (GPS fixes) to a set of available locations randomly sampled from an "available distribution" [37] [4]. The central challenge is defining the spatial and temporal boundaries of this distribution. Common, though often simplistic, methods include using the Minimum Convex Polygon (MCP) or Kernel Density Estimate (KDE) of all observed locations to represent the available area [4].
Inhabited vs. Uninhabited Range: This method, more common in species distribution modeling, compares conditions within an animal's inhabited range (e.g., home range) to conditions in the surrounding, potentially suitable but unused, landscape [38].

The following workflow diagram illustrates the critical decision points in defining availability for an RSF analysis.

Practical Application: Protocols and Best Practices

Step-by-Step Experimental Protocol for a Use-Availability RSF

This protocol outlines the steps for a standard use-availability RSF analysis, highlighting key decisions regarding habitat availability.

Step 1: Data Preparation and Cleaning

Animal Movement Data: Obtain and clean GPS telemetry data. Filter out 2D/3D fixes with high horizontal dilution of precision (HDOP) values. For data from platforms like Argos, apply a movement-based filter (e.g., sdafilter in R) to remove obvious spurious locations [39].
Environmental Covariates: Compile a geospatial raster stack of environmental variables hypothesized to influence habitat selection (e.g., land cover, vegetation indices, elevation, distance to water). Ensure all rasters are at the same resolution and projected to the same coordinate system.

Step 2: Define the Available Distribution (The Critical Choice)

For Population-Level Inference (2nd Order): For each individual, generate a Minimum Convex Polygon (MCP) or, preferably, a Kernel Utilization Distribution (KDE) based on all telemetry points. A 99% isopleth is often used to exclude outliers. This area represents the home range from which available points are sampled [4].
For Within-Home-Range Inference (3rd Order): For each used GPS location, define a local availability domain. A common method is to generate a buffer around the previous location. The radius should be ecologically relevant, often based on the animal's maximum observed step length or velocity [40].

Step 3: Generate Available Points

Randomly sample a large number of available points (typically 10-100x the number of used points) from the available distribution defined in Step 2. The high ratio ensures statistical efficiency and convergence of parameter estimates [40].

Step 4: Extract Covariate Values

For every used and available location, extract the values from the environmental covariate rasters. Combine these into a single data frame with a binary response variable (e.g., 1 for used, 0 for available).

Step 5: Model Fitting via Logistic Regression

Fit a logistic regression model to the used/available data. The probability of use is modeled as: $$Pr(y{i} = 1|{\mathbf{x}}{i} ) = \frac{{{\text{exp}}\left( {\beta{0} + \beta{1} x{1,i} + \cdot \cdot \cdot + \beta{k} x{k,i} } \right)}}{{1 + {\text{exp}}\left( {\beta{0} + \beta{1} x{1,i} + \cdot \cdot \cdot + \beta{k} x{k,i} } \right)}}$$
Note: The intercept ( \beta{0} ) is not directly interpretable in a use-availability design, but the selection coefficients ( \beta{1} ... \beta_{k} ) are. The exponential of a coefficient, ( {\text{exp}}(\beta) ), indicates the relative change in the odds of selection for a one-unit change in the covariate [4].

Step 6: Model Validation and Interpretation

Use k-fold cross-validation (with folds based on individual animals or temporal blocks) to evaluate the model's predictive performance [40].
Interpret the sign and magnitude of the coefficients ( \beta ) in the context of the specific availability definition. A positive ( \beta ) for a covariate indicates selection for that habitat feature relative to what was defined as available.

Table 2: Key Research Reagent Solutions for RSF Analysis

Reagent / Tool	Type	Primary Function in RSF Analysis
GPS/GPS-GSM Loggers	Hardware	Provides high-resolution, highly accurate spatiotemporal location data (the "used" points). Essential for modern movement ecology [39].
R Statistical Software	Software	The primary environment for statistical analysis of ecological data. Provides a unified platform for data manipulation, analysis, and visualization.
R Package: `amt`	Software	Provides a coherent toolkit for animal movement telemetry analyses. Core functions include track manipulation, generating random steps (availability), and fitting Step Selection Functions (SSFs) [4].
R Package: `glmmTMB`/`lme4`	Software	Enables fitting of generalized linear mixed models (GLMMs), allowing the inclusion of random effects (e.g., individual animal ID) to account for grouped data and pseudo-replication [38].
GIS Software (e.g., QGIS, ArcGIS)	Software	Used for managing, processing, and analyzing spatial data; crucial for creating and processing raster stacks of environmental covariates.
Land Cover Datasets (e.g., NLCD, Copernicus)	Data	Pre-processed, often freely available spatial layers that serve as key candidate covariates in habitat selection models [38].
MCP/KDE Algorithms	Method	Standard geometric and probabilistic methods for delineating home ranges, which form the basis for sampling available points in second-order RSFs [4].

Advanced Considerations and Methodological Validation

Addressing Subjectivity and Bias

The subjective nature of defining availability can be mitigated through several approaches:

Sensitivity Analysis: A critical validation step is to test how robust the RSF results are to different, biologically plausible definitions of availability. This involves re-running the analysis with varying MCP percentiles, KDE smoothing parameters, or buffer radii and comparing the resulting selection coefficients [40].
Integrated Step Selection Analysis (iSSA): The iSSA framework explicitly models movement and habitat selection simultaneously. It defines availability dynamically for each relocation based on the animal's movement capacity (step length distribution) and directionality (turning angle distribution) [40]. This provides a more mechanistic and less arbitrary definition of what was accessible to the animal at each point in time. Studies have shown that iSSAs maintain nominal Type I error rates and often have higher statistical power than static RSFs [40].
Functional Responses in Habitat Selection: A habitat selection functional response occurs when the strength of selection for a habitat type depends on its availability [38]. This can be modeled by including an interaction term between the covariate and its availability within the home range or study area. Ignoring functional responses can lead to incomplete or misleading inference at the population level.

Comparative Validation of Methods

A simulation study by [40] provides a quantitative comparison of different methods for analyzing tracking data. Their key findings regarding methods that rely on defining availability are summarized below.

Table 3: Comparative Performance of Statistical Methods for Habitat Selection Analysis

Statistical Method	Handling of Autocorrelation	Definition of Availability	Type I Error Rate	Statistical Power
Spatial Logistic Regression (SLRM)	Poor (ignores it)	User-defined (e.g., MCP)	Frequently exceeds nominal levels	Moderate, but biased
Spatio-Temporal Point Process (ST-PPM)	Good (models it)	Mathematically derived from point process	Nominal	High
Step Selection Function (SSF)	Moderate (via data stratification)	Dynamic, based on movement	Slightly exceeds nominal levels	High
Integrated SSF (iSSA)	Excellent (explicitly models it)	Dynamic, based on movement	Nominal	Highest

This validation demonstrates that while traditional RSFs (SLRMs) are widely used, their performance is often suboptimal. The iSSA framework, with its mechanistic definition of availability, is recommended for its robust statistical properties and richer ecological inference [40].

Integrating Terrestrial and Aquatic Movement Analytics for Cross-Ecosystem Insights

Movement ecology has traditionally developed within ecosystem-specific silos, with distinct methodologies for terrestrial, aquatic, and aerial organisms [41]. However, a comprehensive understanding of ecological processes such as nutrient transfer, species interactions, and the effects of global change requires an integrated, cross-ecosystem perspective [42]. The movement of animals themselves constitutes a fundamental biological mechanism linking landscapes and seascapes. This protocol outlines methods for integrating terrestrial and aquatic movement analytics, providing a unified framework to quantify cross-ecosystem connectivity and derive novel ecological insights. This approach is framed within advanced statistical methodologies for movement ecology, emphasizing the synthesis of disparate data types across ecosystem boundaries.

Application Notes: Theoretical and Analytical Framework

Integrating movement data across ecosystems allows researchers to address questions about resource use, migration corridors, and energy flows at landscape and seascape scales. The following notes detail the core components of this framework.

The Hierarchical Segmentation Framework: Animal movement occurs across multiple spatio-temporal scales. A hierarchical framework partitions an individualâ€™s trajectory into a nested hierarchy of behavioral modes and phases [41]. This can connect fine-scale foraging bouts in one ecosystem (e.g., a bear fishing in a river) to larger-scale migratory phases between ecosystems (e.g., the same bear moving to terrestrial denning sites), thereby improving forecasts of how animals adapt their space use under environmental change [41].
The Net Watershed Exchange (NWE) Concept Adapted for Movement Ecology: Originally developed for carbon accounting, the NWE framework uses the watershed as the fundamental spatial unit that integrates terrestrial and aquatic ecosystems [42]. This concept can be powerfully adapted for movement ecology. By defining the watershed as the analytical unit, researchers can quantify the in- and out-flux of animals, their energy, and transported nutrients, constraining estimates of cross-ecosystem connectivity and its demographic consequences.
Cocreation of Visualizations for Interpretability: The complexity of integrated movement data necessitates clear communication. Involving end-users (e.g., researchers, managers, and even patients in health-related mobility studies) in the visualization design process ensures that the outputs are meaningful and actionable [43]. Recommendations include using large, clear fonts, ensuring color choices are friendly for those with vision impairments, and adding contextual factors (e.g., medication cycles, weather) to reflect the nuances of movement behavior [43].

Experimental Protocols

Protocol 1: Cross-Ecosystem Tracking and Data Collection

This protocol describes the simultaneous collection of movement data from linked terrestrial and aquatic fauna.

I. Research Reagent Solutions

Item	Function in Protocol
GPS Tracking Tags	Provides high-resolution spatio-temporal location data for terrestrial and aerial species. Essential for delineating home ranges, migration routes, and identifying aquatic foraging sites [41].
Biologging Devices	Miniaturized sensors (accelerometers, gyroscopes, depth sensors) deployed on aquatic or marine species to record fine-scale movement and behavior in environments where GPS is unreliable [41] [43].
Passive Integrated Transponder (PIT) Systems	A cost-effective method for detecting tagged individuals (e.g., fish, amphibians) at specific locations like streams or river gates, ideal for measuring movement between aquatic and terrestrial habitats.
Wearable Inertial Measurement Units (IMUs)	Body-worn sensors (e.g., McRoberts MoveMonitor+, Axivity AX6) used to quantify detailed mobility outcomes like real-world walking speed and stride length in both animal and human studies [43].

II. Procedure

Site Selection: Choose a watershed or coastline that represents a clear terrestrial-aquatic interface and is known for fauna that utilize both ecosystems (e.g., seabirds, amphibians, anadromous fish, riparian mammals).
Animal Capture and Tagging: Follow ethical and permitted procedures for capturing target species. Deploy appropriate tags:
- Fit terrestrial animals (e.g., bears, otters) with GPS tags.
- Fit aquatic animals (e.g., fish, marine mammals) with biologging devices.
- For small species or high-density studies, use PIT tags.
Infrastructure Deployment: Install stationary PIT tag antennae at ecosystem boundaries (e.g., river mouths, the land-sea interface) to detect movement of tagged individuals.
Data Collection Period: Conduct tracking over a time scale relevant to the ecological question (e.g., a full seasonal cycle, a migration period). Synchronize data timestamps from all devices to UTC.
Data Retrieval: Recover data via remote download or device recovery. For biologgers, this may require recapturing the animal.

Protocol 2: Integrated Trajectory Analysis for Encounter Modeling

This protocol uses a simplified, cost-effective method to extract and analyze fine-scale movement trajectories, applicable to small aquatic and terrestrial organisms, to model cross-system encounters.

I. Research Reagent Solutions

Item	Function in Protocol
High-Resolution Smartphone Camera	Serves as a primary data acquisition tool for recording movement in controlled settings. Modern smartphones (~40-million pixels) provide sufficient resolution for tracking small animals (~1 mm) [44].
Fiji/ImageJ Software with Manual Tracking Plugin	A freely available, open-source image processing platform. Its manual tracking package is used to digitize movement trajectories from video data, generating X,Y-coordinate time series [44].
Six-Well Culture Plate	A standardized experimental arena for observing small aquatic organisms like copepods, providing a controlled volume to assess swimming behavior [44].
LED Illumination System	Provides continuous, uniform illumination beneath the experimental arena to avoid phototactic responses in the study organisms and ensure consistent video quality [44].

II. Procedure

Video Setup: Position a smartphone camera overlooking a well of a culture plate. Place a uniform LED light source beneath the plate with a translucent frosted cover to diffuse light [44].
Sample Acclimation: Introduce a single specimen into the well and allow it to acclimate for at least one hour to avoid physical shock [44].
Video Recording: Record the organism's swimming or movement behavior for a set duration (e.g., 10 seconds) at a high frame rate (e.g., 30 fps).
Trajectory Extraction:
- Import the video into Fiji software.
- Use the "Manual Tracking" plugin to click on the organism's central point in each frame.
- Export the tracked data as a table of X,Y-coordinates over time.
Motion Parameter Calculation: Calculate key motion parameters from the coordinate data. The instantaneous swimming speed ( Vt ) (mm sâ»Â¹) at time step ( t ) is computed as: [ Vt = \sqrt{(x{t+1} - xt)^2 + (y{t+1} - yt)^2} \times \alpha \times p ] where ( \alpha ) is a unit conversion constant and ( p ) is the recording speed [44]. Analyze other parameters like total distance traveled and jump frequency.
Encounter Rate Modeling: Apply reaction-diffusion theory to the extracted trajectories. Treat encounters as first-passage events to derive well-behaved probabilities for interactions between individuals, which is more rigorous than simple distance-threshold overlaps and better reflects realistic diffusive movement [41].

Data Integration and Visualization Protocols

Workflow for Integrated Data Synthesis

The following diagram outlines the logical workflow for synthesizing multi-source movement data into cross-ecosystem insights.

Data Synthesis Workflow

Visualization and Accessibility Standards

Effective visualization is critical for interpreting complex, integrated datasets. Adhere to the following standards:

Color Contrast: Ensure sufficient contrast between text and its background. The Web Content Accessibility Guidelines (WCAG) recommend a contrast ratio of at least 4.5:1 for standard text and 3:1 for large text [45]. Tools like Stark for Figma/Sketch can automate these checks [46].
Strategic Color Use: Use color as a functional element. Sequential color palettes show magnitude, diverging palettes show deviation from a midpoint, and qualitative palettes distinguish categories [47]. Avoid red-green and red-black combinations, which are problematic for the most common forms of color vision deficiency [46].
Beyond Color: Do not use color as the only visual means of conveying information. Supplement color coding with shapes, patterns, or icons to ensure accessibility for all users [46] [48].

The following tables summarize key motion parameters and analytical outputs from integrated movement studies.

Table 1: Experimentally Derived Motion Parameters for a Small Aquatic Organism (Eodiaptomus japonicus) [44]

Parameter	Value	Notes / Context
Average Swimming Speed	9.8 mm sâ»Â¹	Measured over a 10-second trajectory.
Predominant Cruising Speed	~5.0 mm sâ»Â¹	Most frequently observed speed.
Maximum Instantaneous Speed	190.1 mm sâ»Â¹	Achieved during a spontaneous "jump" event.
Total Distance (10s)	98.5 mm	--
Jump Frequency (10s)	16 jumps	Characterized by sudden bursts of movement.

Table 2: Analytical Outputs from Integrated Movement Models

Analytical Output	Description	Ecological Application
First-Encounter Probabilities	Well-normalized probabilities of encounter derived from reaction-diffusion theory and first-passage events [41].	Quantifying predation risk, disease transmission, and social contact rates.
Cumulative Threat Overlap	Spatial overlap index between animal movement hotspots and anthropogenic threats (e.g., shipping traffic, infrastructure) [41].	Proactive conservation planning and spatial prioritization for mitigation.
Energetics-Informed Migration Network	A pathfinding model (e.g., using modified Dijkstra's algorithm) that incorporates energy constraints and environmental drivers like wind [41].	Predicting migration routes and identifying critical stopover habitats under climate change.

The integration of terrestrial and aquatic movement analytics represents a paradigm shift in movement ecology. By adopting watershed or seascape perspectives, employing hierarchical segmentation, and leveraging cost-effective tracking technologies, researchers can transcend traditional ecosystem boundaries. The protocols outlined hereinâ€”from data collection and trajectory analysis to visualization and modelingâ€”provide a statistically robust foundation for uncovering the complex ecological linkages driven by animal movement. This integrated approach is paramount for forecasting ecological outcomes in a rapidly changing world.

Advanced statistical models for analyzing animal movement data have become fundamental tools in ecological research and are increasingly essential for informing conservation and management actions [4] [49]. The proliferation of biologging technologies has generated an explosion of movement data, creating both unprecedented opportunities and significant analytical challenges for practitioners [41]. Despite the development of sophisticated modeling approaches, a substantial science-practice gap persists, limiting the effective translation of analytical outputs into conservation outcomes. This gap stems from the complex landscape of available methods, each with distinct mathematical assumptions, data requirements, and interpretive frameworks [4] [49].

Movement ecology as a discipline has reached a critical juncture where methodological innovation must be paired with enhanced accessibility. Resource selection functions (RSF), step-selection functions (SSF), and hidden Markov models (HMMs) represent three prominent approaches for relating animal movement to environmental covariates, yet each yields varying ecological insights and identifies different "important" areas for conservation [4]. This variability in outputs creates confusion for managers seeking unambiguous guidance for conservation planning. Furthermore, method selection and temporal scale significantly influence ecological inferences on estimated animal behavioral states, with consequences for resource allocation and management decisions [49].

This protocol provides a structured framework for conservation practitioners to navigate the complex landscape of movement models, with specific guidance on method selection, implementation, and interpretation for applied conservation contexts. By bridging the science-practice gap, we aim to empower researchers and managers to leverage cutting-edge analytical tools for more effective conservation outcomes.

Foundational Movement Models: Comparative Analysis

Core Methodologies and Their Applications

Table 1: Comparative analysis of core movement modeling approaches

Model Type	Primary Ecological Question	Data Requirements	Spatial Scale	Key Outputs	Conservation Applications
Resource Selection Function (RSF)	Habitat preference relative to availability [4]	GPS locations, environmental layers [4]	Population or home range scale (2nd order selection) [4]	Relative probability of use across landscape [4]	Identification of critical habitat; protected area design [4]
Step-Selection Function (SSF)	Habitat selection during movement [4]	High-frequency GPS data, environmental layers [4]	Movement path scale (3rd order selection) [4]	Conditional selection probability given movement constraints [4]	Movement corridor identification; connectivity planning [4]
Hidden Markov Model (HMM)	Behavioral state-environment relationships [4] [49]	Regular time-series data, multiple movement metrics [49]	Multiple scales via behavioral states [49]	Behavioral state sequences; state-specific habitat associations [4] [49]	Behavior-specific habitat protection; disturbance impact assessment [4]
Movement Persistence Model (MPM)	Continuous variation in movement behavior [49]	Irregular time-series, error-prone data [49]	Individual movement scale	Move persistence parameter; resting/foraging periods [49]	Identification of fine-scale behavioral patterns; resting site protection [49]

Quantitative Performance Metrics

Table 2: Method performance across temporal scales based on green sea turtle case study [49]

Model	Temporal Resolution	Behavioral States Identified	State Interpretation	Handling Location Error	Computational Demand
HMM	1-hour	3-5 states	Variable prey associations by behavior [4] [49]	Moderate (requires regular data) [49]	High
HMM	8-hour	3-5 states	Distinguishes ARS from migration [49]	Moderate (requires regular data) [49]	Medium
MPM	1-hour	Multiple persistence levels	Identifies resting during migration [49]	High (incorporates error directly) [49]	Medium
MPM	8-hour	Broad behavioral categories	Distinguishes ARS from migration [49]	High (incorporates error directly) [49]	Low
M4	1-hour	3-5 states	Similar to HMM but with mixed membership [49]	Low (handles missing values well) [49]	High
M4	8-hour	3-5 states	Similar to HMM but with mixed membership [49]	Low (handles missing values well) [49]	Medium

Decision Framework for Model Selection

The following workflow provides a systematic approach for conservation practitioners to select appropriate movement models based on their specific management questions, data characteristics, and analytical resources.

Figure 1: Decision workflow for selecting movement models. This framework guides practitioners through key questions about their conservation objectives, data characteristics, and resources to arrive at appropriate modeling approaches. RSF = Resource Selection Function; SSF = Step-Selection Function; HMM = Hidden Markov Model; MPM = Movement Persistence Model; M4 = Mixed-Membership Method for Movement.

Experimental Protocols for Movement Model Implementation

Protocol 1: Resource Selection Function for Critical Habitat Identification

Application Context: Identifying critical habitat for protection under species conservation legislation (e.g., Endangered Species Act) [4].

Materials and Data Requirements:

Animal location data: GPS telemetry points from deployed biologging devices
Environmental covariates: Geospatial layers representing habitat features, resources, and anthropogenic factors
Software: R statistical environment with amt package [4]

Procedure:

Data Preparation (Duration: 2-3 days)
- Import and clean animal location data, removing erroneous fixes
- Extract environmental covariate values at each animal location
- Generate available points using a specified availability domain (e.g., minimum convex polygon, kernel utilization distribution) [4]

Model Specification (Duration: 1 day)
- Define the RSF using an exponential form: (w(\mathbf{x}) = \exp(\beta{1}x{1} + \beta{2}x{2} + \cdots + \beta{k}x{k})) where (\mathbf{x}={{x}{1},\dots, {x}{k}}) denotes the values of k predictor habitat variables [4]
- Implement via logistic regression comparing used versus available locations: (Pr(y{i} = 1|\mathbf{x}{i}) = \frac{\exp(\beta{1}x{1,i} + \beta{2}x{2,i} + \cdots + \beta{k}x{k,i})}{1 + \exp(\beta{1}x{1,i} + \beta{2}x{2,i} + \cdots + \beta{k}x{k,i})}) [4]
Model Fitting and Validation (Duration: 2 days)
- Fit RSF using generalized linear models with binomial family
- Validate model using k-fold cross-validation with individual animals as folds
- Assess predictive performance using receiver operating characteristic (ROC) curves
Interpretation and Application (Duration: 2 days)
- Map relative probability of use across the landscape
- Identify areas exceeding probability threshold for conservation priority
- Designate critical habitat boundaries based on probability contours

Troubleshooting Tips:

If model convergence issues occur, check for collinearity among environmental covariates
If predictive performance is poor, consider non-linear relationships using generalized additive models (GAMs)
If availability domain is ambiguous, test sensitivity using multiple availability definitions

Protocol 2: Hidden Markov Model for Behavior-Specific Habitat Management

Application Context: Managing human activities to minimize disturbance during sensitive behavioral states (e.g., foraging, breeding) [49].

Materials and Data Requirements:

High-resolution movement data: Regular time-series from GPS or biologging devices
Movement metrics: Step lengths, turning angles, acceleration data
Environmental covariates: Spatially explicit habitat variables
Software: R with momentuHMM package [49]

Procedure:

Data Preparation and Processing (Duration: 3-4 days)
- Regularize tracking data to consistent time intervals
- Calculate movement metrics: step lengths and turning angles
- Extract environmental covariates at each tracking location
- Standardize covariates to mean = 0, standard deviation = 1

Model Specification (Duration: 2 days)
- Specify number of behavioral states (typically 2-5) based on biological knowledge
- Define state-dependent distributions for movement metrics (e.g., gamma for step lengths, von Mises for turning angles)
- Formulate state-transition probability matrix as function of covariates
- Specify state-dependent habitat selection formulas
Model Fitting and Selection (Duration: 3 days)
- Fit HMM using maximum likelihood estimation with numerical optimization
- Initialize multiple times with different starting values to avoid local maxima
- Compare models with different numbers of states using Akaike Information Criterion (AIC)
- Validate state decoding using pseudo-residuals
Behavioral State Mapping and Application (Duration: 2 days)
- Decode most likely behavioral state sequence using Viterbi algorithm
- Map spatial distribution of behavioral states
- Identify environmental correlates of each behavioral state
- Develop behavior-specific management recommendations (e.g., seasonal closures during breeding)

Troubleshooting Tips:

If states are poorly identified, consider different movement metrics or covariate parameterizations
If model fitting is unstable, reduce number of states or simplify covariate relationships
If computational time is excessive, reduce data resolution or use data subsampling

Table 3: Essential tools and resources for movement ecology research and application

Tool Category	Specific Tools/Platforms	Primary Function	Application Context	Technical Requirements
Tracking Hardware	Argos-linked Fastloc GPS (Wildlife Computers) [49]	Animal location data collection	Marine and terrestrial species tracking	Satellite connectivity; attachment expertise
Analytical Software	R packages: `amt`, `momentuHMM` [4] [49]	Movement track analysis; model implementation	Statistical modeling of movement paths	R programming proficiency
Data Management	Movebank data repository	Centralized data storage and management	Collaborative movement data projects	Internet access; data standardization
Environmental Data	Remote sensing products (e.g., Copernicus, MODIS)	Habitat covariate extraction	Linking movement to environmental features	GIS software; spatial analysis skills
Visualization Tools	R packages: `ggplot2`, `sf`, `leaflet`	Spatial visualization of movement patterns	Results communication; mapping	Basic cartographic principles

The integration of advanced movement models into conservation practice requires careful consideration of methodological assumptions, data requirements, and management objectives. As demonstrated through the comparative analysis and decision framework presented here, no single modeling approach is universally superior; rather, model selection must be guided by the specific conservation question, data characteristics, and intended application [4] [49]. The experimental protocols provide actionable methodologies for implementing these models in real-world conservation contexts, while the toolkit equips practitioners with essential resources for movement ecology applications.

Future directions in movement ecology should prioritize the development of more accessible modeling frameworks, standardized implementation protocols, and enhanced training opportunities for conservation professionals. By bridging the science-practice gap, we can leverage the full potential of movement ecology to address pressing conservation challenges in an era of rapid environmental change.

Computational Solutions for Handling Massive Individual-Level Trajectory Datasets

The analysis of individual-level trajectory data is fundamental to advancing research in movement ecology and beyond. Such datasets, which record the path of an entity through time and space (or through a state-space), are inherently massive and complex, presenting significant challenges in data handling, imputation, and interpretation. The field is moving beyond simple path descriptions towards a hierarchical understanding of movement, decomposing tracks into statistically defined building blocks to infer underlying biological processes and external drivers [8]. Simultaneously, in biomedical research, analogous challenges are found in analyzing high-dimensional patient health or single-cell trajectories to predict outcomes and understand drug effects [50] [51]. This protocol outlines a suite of computational and statistical solutions designed to overcome these challenges, enabling researchers to transform raw, massive trajectory datasets into meaningful ecological and clinical insights.

Quantitative Comparison of Analytical Frameworks

The selection of an appropriate statistical model is crucial, as each framework operates under different assumptions and is suited to answering specific types of ecological questions. The table below summarizes the core characteristics of three common approaches.

Table 1: Comparison of Statistical Models for Analyzing Species-Habitat Associations from Movement Data

Model	Core Function	Data Scale & Requirements	Key Advantages	Primary Limitations
Resource Selection Function (RSF) [4]	Estimates the relative probability of habitat use based on environmental covariates.	Broad-scale; uses "used" vs. "available" locations.	Provides broad-scale information on species-habitat relationships; ease of implementation.	Does not explicitly account for movement autocorrelation or sequential decision-making.
Step-Selection Function (SSF) [4]	Models the selection of each subsequent step conditional on the animal's current location and state.	Fine-scale; requires high-temporal-resolution relocation data.	Explicitly accounts for movement autocorrelation and the sequential nature of movement decisions.	Requires a higher frequency of data compared to RSFs.
Hidden Markov Model (HMM) [4] [8]	Relates discrete, latent behavioral states to environmental covariates and movement metrics.	Fine-scale; links movement patterns (e.g., step-length, turning-angle) to behavior.	Infers unobserved behavioral states, providing a direct link between internal state and movement.	A fundamentally different model from selection functions, focused on state estimation rather than habitat selection.

Beyond these established ecological models, advanced computational approaches are demonstrating significant performance gains in trajectory forecasting. The Digital Twinâ€”Generative Pretrained Transformer (DT-GPT) model, for instance, has been benchmarked against a range of state-of-the-art machine learning models on diverse clinical datasets.

Table 2: Benchmarking Performance of DT-GPT against State-of-the-Art Models on Forecasting Tasks [50]

Dataset	Forecasting Task	Best Performing Model (Scaled MAE)	Second Best Model (Scaled MAE)	Relative Improvement
Non-Small Cell Lung Cancer (NSCLC)	Predict 6 lab values weekly for 13 weeks post-therapy.	DT-GPT (0.55 Â± 0.04)	LightGBM (0.57 Â± 0.05)	3.4%
Intensive Care Unit (ICU)	Forecast next 24 hours for respiratory rate, magnesium, and oxygen saturation.	DT-GPT (0.59 Â± 0.03)	LightGBM (0.60 Â± 0.03)	1.3%
Alzheimer's Disease	Forecast cognitive scores over the next 24 months.	DT-GPT (0.47 Â± 0.03)	Temporal Fusion Transformer (0.48 Â± 0.02)	1.8%

MAE: Mean Absolute Error. Scaled MAE allows for comparison across variables.

Experimental Protocols

Protocol 1: Hierarchical Path Segmentation and StaME Identification

This protocol details the process of deconstructing a raw movement track into Statistical Movement Elements (StaMEs) and higher-order behavioral modes [8].

1. Data Preprocessing:

Input: A time-series of animal relocations (e.g., GPS fixes) including timestamp, longitude, and latitude.
Calculation: Derive step-length (SL) and turning-angle (TA) time series from the relocation data.
Quality Control: Filter and clean data to remove obvious outliers (e.g., GPS fixes with unrealistic velocities).

2. Segmenting the Track:

Divide the entire SL and TA time series into short, consecutive, fixed-length segments (e.g., 10-30 relocation points). The duration of these segments should be ecologically relevant and reflect the scale of canonical activities.

3. Calculating Segment Statistics:

For each fixed-length segment, compute a vector of summary statistics. This typically includes:
- Mean and standard deviation of step-lengths.
- Mean direction and concentration of turning-angles.
- Autocorrelations of SL and TA at lag 1.
This results in a data matrix where each row is a segment, described by its statistical vector.

4. Clustering and StaME Classification:

Perform unsupervised clustering (e.g., k-means, Gaussian Mixture Model) on the matrix of segment statistics.
The optimal number of clusters can be determined using metrics like the Bayesian Information Criterion (BIC) or silhouette score.
Interpret the centroid of each cluster as a unique StaME (e.g., "directed fast movement," "random slow movement").

5. Constructing Higher-Order Modes:

A sequence of identical StaMEs constitutes a Canonical Activity Mode (CAM), a homogeneous movement bout.
A variable-length sequence of different, but characteristic, CAMs can be interpreted as a Behavioral Activity Mode (BAM)
These BAMs can be linked to overarching ecological behaviors like foraging or migration.

Protocol 2: Tensor Imputation for Single-Cell Transcriptomic Trajectories (TIGERS)

This protocol describes the TIGERS method for predicting missing values in single-cell gene-expression data to enable the analysis of drug-induced transcriptomic trajectories [51].

1. Tensor Construction:

Input: Single-cell gene-expression data from multiple drugs, genes, and cells.
Structure: Organize the data into a third-order tensor (\mathcal{T}) of dimensions (Drugs Ã— Genes Ã— Cells). Note that this tensor will be highly sparse (>95% missing values).

2. Tensor-Train (TT) Decomposition:

Apply the TT decomposition algorithm to the sparse tensor (\mathcal{T}) to obtain a low-rank approximation.
The TT decomposition factorizes (\mathcal{T}) into a series of smaller core tensors, which collectively capture the multi-way relationships in the data.
The rank of the decomposition is a key hyperparameter that can be determined via cross-validation.

3. Imputation and Reconstruction:

Use the decomposed core tensors to reconstruct a complete tensor (\mathcal{\hat{T}}).
The values in (\mathcal{\hat{T}}) are imputed for all missing entries in the original tensor (\mathcal{T}).

4. Trajectory Inference:

For each drug, use the imputed single-cell gene-expression profiles (from (\mathcal{\hat{T}})) to construct a low-dimensional embedding (e.g., using UMAP or t-SNE).
On this embedding, infer cell-state trajectories, identifying branches and endpoints that represent different drug-induced outcomes.

5. Pathway Trajectory Analysis:

Map the gene-expression signatures along the inferred cell trajectories.
Use gene-set enrichment analysis to calculate the impact on specific biological pathways for each point (vertex) along the trajectory.
This reveals how pathway regulation changes dynamically as cells transition between states in response to a drug.

Successful implementation of the protocols requires a combination of data, software, and computational resources.

Table 3: Key Research Reagents and Computational Tools

Category	Item	Function & Application
Data Sources	GPS/ATLAS Relocation Data [8]	The primary empirical input for animal movement analysis, used to derive step-length and turning-angle time series.
	Electronic Health Records (EHR) [50]	Real-world data source for constructing clinical health trajectories, containing demographics, diagnoses, and lab results.
	Single-Cell RNA-Sequencing Data [51]	The foundational data for transcriptomic trajectory analysis, measuring gene expression at the level of individual cells.
Software & Packages	`amt` R package [4]	A specialized R package for managing animal movement data and fitting resource selection functions (RSFs) and step-selection functions (SSFs).
	`momentuHMM` R package [4]	An R package for implementing hidden Markov models (HMMs) and related state-space models on animal movement data.
	Tensor Decomposition Libraries (e.g., TensorLy)	Python libraries providing implementations of tensor decomposition algorithms like Tensor-Train for data imputation.
Computational Models	DT-GPT [50]	A fine-tuned large language model for forecasting multi-variable clinical trajectories from EHR data, handling missingness and noise.
	TIGERS [51]	A computational pipeline employing tensor imputation to predict drug-induced single-cell transcriptomic landscapes and pathway trajectories.

Model Selection, Validation, and Cross-Disciplinary Translation

In the field of movement ecology, statistical models are fundamental for translating raw animal tracking data into meaningful biological insights. The choice of model directly shapes our understanding of species-habitat relationships, animal behavior, and space use [4]. Among the most prevalent are Resource Selection Functions (RSFs), Step Selection Functions (SSFs), and Hidden Markov Models (HMMs). While often applied to similar tracking datasets, these models operate on different principles, account for different processes, and ultimately can yield contrasting ecological inferences [4] [19]. This article provides a direct comparison of these three foundational methods, framed within the context of a broader thesis on movement ecology statistical methods. We elucidate their mathematical underpinnings, showcase their divergent outputs through a case study, and provide detailed protocols for their application, thereby empowering researchers to select and implement the most appropriate tool for their specific research questions.

Model Foundations and Mathematical Frameworks

Resource Selection Functions (RSFs)

RSFs are a classic approach used to quantify habitat selection by comparing environmental conditions at locations used by an animal to those available within a defined area, such as its home range [4] [19]. The RSF is typically an exponential function of the form: [ w(\mathbf{x}) = \exp(\beta1 x1 + \beta2 x2 + \cdots + \betak xk) ] where (w(\mathbf{x})) is the relative probability of selection for a resource unit with habitat covariate values (\mathbf{x} = {x1, \dots, xk}), and (\beta1, \dots, \betak) are the selection coefficients to be estimated [4]. In practice, these coefficients are often estimated using logistic regression, comparing "used" locations (coded as 1) to "available" locations (coded as 0) [19]. RSFs are powerful for identifying broad-scale habitat preferences but typically assume data points are independent and do not explicitly model the movement process linking them [4].

Step Selection Functions (SSFs)

SSFs extend the concept of RSFs by explicitly incorporating animal movement into the habitat selection framework. They assess selection by comparing the environmental and movement characteristics of the observed step (the straight-line segment between two consecutive locations) to a set of alternative, hypothetical steps the animal could have taken [52] [4]. The probability of an animal moving to location (\boldsymbol{y}{t+1}) given its current location (\boldsymbol{y}t) is modeled as: [ p(\boldsymbol{y}{t+1} \mid \boldsymbol{y}t) = \frac{w(\boldsymbol{y}t, \boldsymbol{y}{t+1}) \phi(\boldsymbol{y}{t+1} \mid \boldsymbol{y}t)}{\int{\boldsymbol{z} \in \Omega} w(\boldsymbol{y}t, \boldsymbol{z}) \phi(\boldsymbol{z} \mid \boldsymbol{y}_t) d\boldsymbol{z}} ] where (w) is a weighting function describing habitat selection and (\phi) is a movement kernel modeling intrinsic movement patterns (e.g., distributions of step lengths and turning angles) [52]. By integrating movement, SSFs model habitat selection at a finer spatiotemporal scale and automatically account for the autocorrelation inherent in tracking data.

Hidden Markov Models (HMMs)

HMMs are a state-space modeling approach designed to infer latent, or "hidden," states from observed time-series data [53]. In movement ecology, the hidden states are typically discrete behavioural modes (e.g., "foraging," "transit"), and the observations are the recorded data (e.g., step lengths, turning angles, or even habitat measurements) [53] [54]. An HMM is characterized by two core processes:

The state process: A Markov chain of hidden states, (St), with transition probabilities (\gamma{ij} = \Pr(S{t+1} = j \mid St = i)), defining the probability of switching from state (i) to state (j).
The observation process: The probability of an observation (xt) given the current state, defined by state-dependent distributions (f(xt \mid S_t = i)) [53]. HMMs are particularly powerful for segmenting tracks into behavioural phases and for investigating how behaviour influences other processes, such as habitat selection [52] [54].

Table 1: Core Mathematical and Conceptual Differences Between RSFs, SSFs, and HMMs.

Feature	Resource Selection Function (RSF)	Step Selection Function (SSF)	Hidden Markov Model (HMM)
Primary Inference	Habitat preference (probability of use)	Movement-informed habitat selection	Latent behavioural states
Data Scale	Use vs. availability (often home range scale)	Step-level choices	Time-series of observations
Handles Autocorrelation	Typically no	Yes, explicitly	Yes, explicitly
Key Assumptions	Independence of locations; defined availability	Movement kernel form; conditional independence of steps	Markov property; state-dependent distributions
Typical Output	Habitat selection coefficients ((\beta))	Habitat & movement selection coefficients	State transition probabilities; state-dependent parameters

Divergent Ecological Inferences: A Case Study

A comparative analysis of a ringed seal (Pusa hispida) movement track starkly illustrates how RSFs, SSFs, and HMMs can lead to different ecological conclusions [4] [19]. The study related the seal's movements to an environmental covariate, prey diversity.

RSF Inference: The RSF analysis suggested a strong positive relationship between seal presence and prey diversity. The selection coefficient values appeared large, implying this was a key habitat feature [4].
SSF Inference: When the same data were analyzed with an SSF, which accounts for the autocorrelated nature of the movement path, the positive relationship with prey diversity was still present but notably weaker. Furthermore, after statistical testing, the relationship was found to be not statistically significant [4]. This highlights how SSFs can correct overly confident conclusions drawn from methods that ignore movement.
HMM Inference: The HMM revealed a more nuanced story. It identified multiple behavioural states (e.g., a "slow-moving" state and a "transit" state) from the movement data. The relationship with prey diversity was not consistent across states. Specifically, there was a strong positive association between prey diversity and the slow-moving behaviour, likely representing foraging, but no such association during transit [4] [19].

This case study demonstrates that model choice is not merely a statistical technicality. The RSF provided a broad, potentially misleading overview. The SSF offered a more robust, movement-conscious estimate of selection. The HMM, however, delivered the most mechanistically rich insight by revealing that habitat selection is behaviour-dependent [4]. Consequently, the "important" areas identified by each model differed, which has direct implications for conservation efforts like designating critical habitat [4].

Integrated and Advanced Frameworks

Recognizing the strengths and limitations of each method has led to the development of integrated models. A key advancement is the HMM-SSF, which combines the multi-state framework of HMMs with the movement-based habitat selection of SSFs [52]. In this model, the SSF forms the observation process of the HMM, allowing the animal's habitat selection rules to switch depending on its behavioural state [52]. For example, an application to plains zebra identified an "encamped" state and an "exploratory" state. While zebra selected for grassland in both states, this selection was significantly stronger during fast, directed exploratory movement [52]. This framework also allows researchers to include covariates on the transition probabilities between states, enabling investigation into what drives behavioural switches (e.g., a diel cycle) [52].

Another critical consideration is the environment's role in shaping interactions. Studies show that failing to account for landscape heterogeneity can lead to spurious inference of social interactions between animals, as individuals may independently be attracted to the same resource [5]. SSFs are flexible tools that can include landscape covariates to control for this confounding effect [5].

Experimental Protocols

Protocol 1: Fitting a Multi-State HMM with Movement Data

This protocol details the process of using an HMM to identify behavioural states from movement data [53] [54].

Data Preparation: From raw location data (e.g., GPS fixes), calculate derived movement metrics for each time step. The most common are:
- Step Lengths: The straight-line distance between consecutive locations.
- Turning Angles: The change in direction between consecutive steps.
Model Formulation:
- Define States: Choose the number of latent behavioural states, (N), to model (e.g., (N=2) or (3)).
- Specify Distributions: Select appropriate parametric distributions for the state-dependent observation distributions. Step lengths are typically modeled with a gamma distribution, and turning angles with a von Mises distribution.
- Define Structure: Specify the initial state probabilities (\boldsymbol{\delta}) and the (N \times N) state transition probability matrix (\boldsymbol{\Gamma}).
Model Fitting: Estimate the model parameters (transition probabilities and state-dependent distribution parameters) by maximizing the likelihood via the forward algorithm. This can be implemented in R using packages such as momentuHMM [4].
State Decoding: Use the Viterbi algorithm to find the most likely sequence of hidden states given the fitted model and the observed data. This assigns a behavioural state to every time point in the track.
Validation: Where possible, validate the inferred states against ground-truthed behavioural observations (e.g., from accelerometers or direct observation) [54].

The workflow for this protocol is summarized in the diagram below.

Protocol 2: Fitting an Integrated HMM-SSF

This protocol outlines the steps for implementing a joint HMM-SSF model to analyze behaviour-dependent habitat selection [52].

Preliminary HMM: Fit a standard HMM to the movement data (step lengths and turning angles) using Protocol 1 as a starting point. This provides an initial estimate of behavioural states.
Generate Available Steps: For each observed step, generate a set of (K) available, or control, steps. These are typically random steps drawn from the movement kernel (\phi) that the animal could have taken but did not.
Extract Covariates: For the endpoint of each observed and available step, extract the values of relevant environmental covariates (e.g., vegetation index, elevation, distance to water).
Specify HMM-SSF Model: Formulate an HMM where the observation process for a given state is defined by an SSF. The likelihood of an observation is proportional to (\exp(\boldsymbol{c}h \cdot \boldsymbol{\beta}h^{(s)} + \boldsymbol{c}m \cdot \boldsymbol{\beta}m^{(s)})), where (\boldsymbol{c}h) are habitat covariates, (\boldsymbol{c}m) are movement covariates, and (\boldsymbol{\beta}^{(s)}) are the state-specific selection coefficients.
Model Fitting: Fit the integrated model by maximizing the joint likelihood, again using the forward algorithm. This simultaneously estimates state transition probabilities, movement parameters, and state-specific habitat selection coefficients.
Inference and Prediction: Interpret the state-specific selection coefficients ((\boldsymbol{\beta}_h^{(s)})) to understand how habitat selection differs by behaviour. The model can then be used for state classification and to simulate space use through, for example, utilization distributions [52].

The workflow for this integrated approach is as follows:

The Scientist's Toolkit

Table 2: Essential Software and Analytical Reagents for Movement Ecology Analysis.

Tool / Reagent	Type	Primary Function	Example Use Case
`amt` R Package [4]	Software Package	Data management, analysis and visualization for animal movement telemetry.	Creating tracks, calculating step lengths/turning angles, fitting RSFs and SSFs.
`momentuHMM` R Package [4]	Software Package	Fitting complex HMMs and related state-space models to animal tracking data.	Implementing multi-state HMMs with various observation distributions.
Step Selection Function (SSF) [52]	Statistical Framework	Modeling animal movement and habitat selection in a unified framework.	Comparing observed steps to available steps to estimate selection coefficients.
Viterbi Algorithm [53]	Computational Algorithm	Determining the most probable sequence of hidden states in an HMM.	"Global decoding" of an animal's most likely behavioural sequence from tracking data.
Forward Algorithm [53]	Computational Algorithm	Efficiently computing the likelihood of an HMM and state probabilities.	Core of model fitting (parameter estimation) for HMMs.
Ground-Truthed Data [54]	Data	Independent observations of animal behaviour.	Validating states inferred by an HMM (e.g., using video or accelerometer data).

The landscape of movement ecology statistical methods is rich and varied. RSFs, SSFs, and HMMs are not interchangeable; they are specialized tools designed for different questions. RSFs provide a foundational understanding of habitat preference at a population or home range scale. SSFs offer a more refined, mechanistic view of fine-scale habitat selection by explicitly incorporating movement constraints. HMMs shift the focus to the behavioural drivers behind movement patterns, uncovering the latent states that govern an animal's decisions.

The emerging trend is toward integration, as exemplified by the HMM-SSF, which acknowledges that animals do not follow a single set of rules. Their movement and habitat selection are behaviour-dependent [52] [4]. The direct comparison presented here underscores a critical conclusion: the choice of model is a consequential decision that directly shapes ecological inference. By understanding the assumptions, strengths, and limitations of each approach, researchers can better match their analytical tools to their biological questions, leading to more robust and insightful conclusions about the lives of moving animals.

The analysis of animal movement data is fundamental to advancing the field of movement ecology, with statistical model selection directly influencing ecological interpretation and conservation outcomes [4]. This Application Note provides a detailed protocol for applying three core statistical modelsâ€”Resource Selection Functions (RSF), Step-Selection Functions (SSF), and Hidden Markov Models (HMM)â€”to a single GPS tracking dataset from a ringed seal (Pusa hispida). The objective is to demonstrate how these models, with their differing assumptions and mathematical underpinnings, can yield complementary or contrasting insights into species-habitat associations [4]. The endangered Saimaa ringed seal (Pusa hispida saimensis) serves as an ideal case study due to its restricted habitat and the critical need for accurate habitat identification to inform its conservation [55].

Model Descriptions and Comparative Framework

The three models interrogate animal-environment relationships at different spatiotemporal scales and behavioral resolutions.

Resource Selection Functions (RSF): RSFs estimate the relative probability of habitat use based on environmental characteristics. They are typically implemented by comparing environmental covariates at "used" locations (animal GPS fixes) versus "available" locations sampled from the animal's home range or study area [4]. The RSF is defined as (w(\mathbf{x}) = \exp(\beta1 x1 + \beta2 x2 + \cdot \cdot \cdot + \betak xk )), where (\mathbf{x}) is a vector of habitat variables and ({\beta}) are the selection coefficients. These coefficients are commonly estimated using logistic regression [4].
Step-Selection Functions (SSF): SSFs extend RSFs by explicitly incorporating animal movement into the habitat selection analysis. They compare environmental covariates at the end of each observed movement step (distance and turning angle between consecutive locations) to those at the ends of random steps taken from the same starting point [4]. This conditions the analysis on the animal's movement path and immediate location, thereby accounting for autocorrelation in the data.
Hidden Markov Models (HMM): HMMs are state-space models that infer latent (unobserved) behavioral states from observed movement data, such as dive metrics or step lengths. The model assumes the animal switches between a finite number of behaviors (e.g., "Resting," "Transiting," "Foraging"), with each state characterized by a unique probability distribution for the observed data. Transition probabilities between states can be modeled as a function of environmental covariates [4] [55].

Model Comparison Table

Table 1: Comparative summary of RSF, SSF, and HMM for movement ecology studies.

Feature	Resource Selection Function (RSF)	Step-Selection Function (SSF)	Hidden Markov Model (HMM)
Primary Ecological Question	Broad-scale habitat preference; relative probability of use [4].	Fine-scale habitat selection conditioned on movement constraints [4].	Behavioral state identification and how state transitions relate to environment [4] [55].
Data Requirements	Used and available locations. Less temporally resolved data can be sufficient [4].	Regular time-step telemetry data; high temporal resolution is preferred [4].	Time-series data (e.g., dive metrics, step lengths); requires sufficient data per state [55].
Handling of Autocorrelation	Does not explicitly account for temporal autocorrelation.	Explicitly accounts for autocorrelation via conditional availability [4].	Explicitly models autocorrelation through state transition probabilities.
Key Advantage	Conceptual and mathematical simplicity; provides landscape-level insight [4].	Integrates movement with habitat selection; more robust inference at the step level.	Links discrete, latent behaviors to environmental drivers; provides a mechanistic understanding [55].
Key Limitation	"Used" vs. "Available" design can be confounded by movement and behavior [4].	More complex implementation than RSF; requires careful step and availability sampling.	Requires a priori assumption of the number of states; model selection can be challenging.
Interpretation of Output	Selection coefficients ((\beta)) indicate habitat preference.	Selection coefficients indicate habitat choice during movement.	State-dependent parameters and transition probabilities describe behavior-habitat relationships.

Experimental Protocol: Application to Ringed Seal Data

Data Collection and Pre-processing

Animal Tagging: Deploy Fastloc GPS-GSM telemetry tags on adult ringed seals. Tags should be attached to the dorsal pelage using two-component epoxy glue post-moult (e.g., late May - early June) [55].
Data Collection: Collect GPS location data. For diving species like ringed seals, collect or derive dive metrics including dive depth, dive duration, post-dive surface interval, and wiggles (proxy for prey capture attempts) [55].
Environmental Data: Extract spatially explicit environmental covariates for each GPS location and corresponding available locations. For ringed seals, relevant covariates include:
- Lake Depth (Bathymetry): Crucial for describing foraging habitat [55].
- Prey Diversity/Abundance: Can be proxied with stable isotope data or acoustic surveys [4] [55].
- Distance to Shore/Ice Edge: Important for risk assessment and access to haul-out sites.
- Season and Diel Period: Categorical variables (e.g., Summer/Winter, Day/Night) to account for temporal changes in behavior [55].

Analytical Workflow

The following diagram illustrates the sequential protocol for data preparation and model application.

Detailed Model Application Protocols

Protocol 3.3.1: Resource Selection Function (RSF)

Define Availability: Generate random available points within the seal's seasonal home range (e.g., calculated via Minimum Convex Polygon or Kernel Density Estimate) [4]. The number of available points per used point is typically 10:1.
Extract Covariates: For all used (1) and available (0) points, extract the values of all environmental covariates (e.g., depth, prey diversity).
Model Fitting: Fit a logistic regression model using the amt package in R [4].
- Model Structure: glm(Used ~ Depth + Prey_Diversity + ..., data = points_data, family = binomial())
Interpretation: Exponentiated coefficients (exp(Î²)) represent Relative Selection Strength (RSS). An RSS > 1 indicates selection for that habitat covariate, while RSS < 1 indicates avoidance.

Protocol 3.3.2: Step-Selection Function (SSF)

Build Steps and Random Steps: Using the amt package, generate observed steps from consecutive GPS locations. For each observed step, generate multiple (e.g., 10-20) random steps from the same starting location. Random steps are drawn from distributions of step lengths and turning angles derived from the empirical data [4].
Extract Covariates: At the endpoint of every observed and random step, extract the values of all environmental covariates.
Model Fitting: Fit a conditional logistic regression model stratified by each step.
- Model Structure: clogit(Used ~ Depth + Prey_Diversity + ... + strata(step_id), data = steps_data)
Interpretation: Similar to RSF, exponentiated coefficients indicate the relative selection strength for a habitat covariate, given the animal's movement trajectory.

Protocol 3.3.3: Hidden Markov Model (HMM)

Data Preparation: Use the momentuHMM package in R [4]. Prepare a time series of observed movement metrics. For ringed seals, use maximum dive depth and dive duration as observation data [55].
Model Specification:
- Define Number of States (N): Based on prior ecological knowledge, initial data exploration, and model selection criteria. For ringed seals, 3 (Winter) or 4 (Summer) states are often appropriate (e.g., "Resting," "Transiting," "Shallow Inactive," "Foraging") [55].
- Define State-Dependent Distributions: Specify probability distributions for the observed data (e.g., Gamma distributions for dive depth and duration).
- Incorporate Covariates: Environmental covariates (e.g., depth, diel period) can be included in the transition probability matrix between states to understand what drives behavioral switching [55].
Model Fitting & Decoding: Fit the HMM using numerical maximum likelihood. Use the Viterbi algorithm to decode the most probable sequence of latent behavioral states from the observed dive data.
Interpretation: Analyze the parameters of the state-dependent distributions to characterize each behavior. For example, the "Foraging" state may be associated with deep, long dives with high "wiggle" counts. Analyze the transition probability matrix to understand behavioral rhythms and how they are influenced by the environment.

Case Study Results and Interpretation

Expected Model Outputs and Contrasts

Applying these models to a single ringed seal track will yield different, yet complementary, results.

RSF Output: May indicate a strong positive selection for areas with high prey diversity across all data, suggesting these are important habitats within the seal's home range [4].
SSF Output: When accounting for movement, the positive relationship with prey diversity might be weaker or non-significant, suggesting that the seal's immediate movement decisions are less influenced by this covariate than the broad-scale RSF implies [4].
HMM Output: Will reveal that the association with prey diversity is state-specific. For instance, there may be a strong positive relationship between prey diversity and the probability of being in, or switching to, a "Foraging" state, but no relationship with the "Transiting" state [4] [55]. The HMM from [55] found foraging probability was highest at depths of 7-30 m in winter and >15 m in summer, and higher during daytime.

Synthesis of Findings Table

Table 2: Hypothesized outcomes from a multi-model analysis of a ringed seal track, demonstrating how model choice shapes ecological inference.

Research Question	RSF Inference	SSF Inference	HMM Inference
What is the relationship with prey diversity?	Strong, positive selection across all locations [4].	Weaker or non-significant relationship when conditioned on movement.	Strong, positive relationship only during the "Foraging" behavioral state [4].
Which areas are identified as "important"?	Broad areas of high prey diversity within the home range.	Linear corridors and specific movement pathways that coincide with prey patches.	Spatially explicit locations where the probability of foraging behavior is high.
How does behavior change from day to night?	Cannot be directly inferred.	Can show diel patterns in habitat types selected during movement.	Quantifies the proportion of time spent in each state (e.g., higher foraging probability during daytime) and diel patterns in state transitions [55].

Table 3: Key software, data, and analytical resources for implementing movement ecology models.

Resource	Type	Primary Function	Reference/Location
`amt` R Package	Software	Comprehensive platform for building and analyzing RSFs and SSFs; handles track creation, step generation, and model fitting [4].	`amt` package documentation
`momentuHMM` R Package	Software	Specialized package for fitting complex HMMs (and related state-space models) to animal movement data, including multi-state models with covariates [4].	`momentuHMM` package documentation
Fastloc GPS-GSM Tag	Hardware	Biologging device that provides high-resolution location data essential for SSF and HMM analyses, even during brief surface intervals [55].	Sea Mammal Research Unit Instrumentation
Lake Bathymetry Layer	Data	A crucial environmental covariate for aquatic species like ringed seals, used as a predictor variable in all models to explain depth-related habitat use [55].	National/Regional Hydrographic Services
Viz Palette Tool	Software	Online tool to test color palettes for data visualizations for color blindness accessibility, ensuring figures are interpretable by all audiences [56].	https://projects.susielu.com/viz-palette

This protocol demonstrates that no single model provides a complete picture of an animal's relationship with its environment. The RSF offers a broad-scale view of habitat preference, the SSF refines this by integrating movement, and the HMM provides a deep, mechanistic understanding of how behavior is linked to environmental drivers [4]. For conservation efforts, such as designating critical habitat for the endangered Saimaa ringed seal, employing this multi-model framework is not just an academic exercise but a critical step. It ensures that identified areas of importance are robust to different statistical assumptions and ecologically meaningful across different behavioral contexts, ultimately leading to more effective and defensible conservation policy.

Assessing Model Fit and Predictive Performance for Movement Paths

Analyzing animal movement requires robust statistical methods to dissect movement paths, identify underlying behaviors, and assess the quality of models fitted to tracking data. Movement ecologists commonly use path segmentation and step selection analysis to understand how internal states and external environmental factors shape movement trajectories. This protocol provides a standardized framework for evaluating the fit and predictive performance of statistical models applied to animal movement data, with a focus on methods that account for the hierarchical structure of movement and the influence of landscape heterogeneity.

Theoretical Framework and Key Concepts

Hierarchical Organization of Movement

Animal movement can be deconstructed into a hierarchy of behavioral segments, providing a foundation for statistical modeling and analysis. This hierarchical approach allows researchers to analyze movement across multiple spatiotemporal scales, from fundamental movement elements to diel activity routines.

Figure 1: Hierarchical organization of animal movement tracks, from fundamental movement elements (FuMEs) that represent basic locomotion to diel activity routines (DARs) that represent daily patterns. Statistical Movement Elements (StaMEs) serve as analyzable proxies for FuMEs when high-resolution sensor data is unavailable [8].

Statistical Movement Elements (StaMEs) as Building Blocks

Statistical Movement Elements (StaMEs) serve as essential analytical constructs when direct observation of fundamental movement elements (FuMEs) is impossible due to data resolution limitations. StaMEs are derived by computing statistics (e.g., means, standard deviations, correlations) for step-length (SL) and turning-angle (TA) time series across short, fixed-length track segments (typically 10-30 relocation points). These statistical vectors are then clustered to identify different movement types, with cluster centroids representing distinct StaME categories (e.g., directed fast movement versus random slow movement) [8].

The table below outlines the quantitative standards for movement path segmentation and analysis:

Table 1: Quantitative Standards for Movement Path Segmentation and Analysis

Parameter	Standard Value/Range	Measurement Context	Statistical Purpose
Segment Length for StaMEs	10-30 consecutive points	High-resolution relocation data (â‰¥5 fixes/minute)	Capture fundamental movement statistics [8]
Step-Length Distribution	Gamma distribution (scale=0.15, shape=6)	Biased correlated random walk simulations	Generate realistic movement steps [5]
Turning-Angle Distribution	von Mises distribution (concentration=4)	Biased correlated random walk simulations	Model directional persistence [5]
WCAG Contrast Ratio (Text)	4.5:1 (minimum)	Visualizations and publications	Ensure accessibility for low vision users [57] [58]
WCAG Contrast Ratio (Large Text)	3:1 (minimum)	Visualizations and publications	Ensure accessibility for large text elements [58] [45]

Experimental Protocols

Protocol 1: Extracting and Classifying Statistical Movement Elements (StaMEs)

Purpose: To decompose raw movement trajectories into statistically analyzable elements that serve as building blocks for hierarchical movement analysis.

Materials Required:

Animal relocation data (GPS, ATLAS, or acoustic telemetry)
R statistical software with required packages (spmodel, unmarked, ctmm, WildlifeDI)
Computing resources sufficient for time-series analysis and clustering

Procedure:

Data Preparation: Import cleaned relocation data and calculate step-lengths (SL) and turning angles (TA) between consecutive points [8].
Segment Definition: Divide the movement track into fixed-length segments of 10-30 relocation points, ensuring segments are consecutive and non-overlapping.
Statistical Characterization: For each segment, compute descriptive statistics including:
- Mean and standard deviation of step-lengths
- Mean turning angle and circular variance
- Autocorrelation measures for both SL and TA series
Dimensionality Reduction: Apply Principal Component Analysis (PCA) to the statistical vectors to reduce dimensionality while preserving variation.
Cluster Analysis: Perform k-means or hierarchical clustering on the principal components to identify distinct StaME categories.
Validation: Validate cluster stability using silhouette analysis or bootstrap methods.
Canonical Activity Mode (CAM) Identification: Group sequences of identical StaME categories into fixed-length CAM segments representing consistent movement behaviors.

Troubleshooting Tips:

If clusters are poorly separated, consider adjusting segment length or incorporating additional statistical features
For tracks with varying sampling rates, resample to consistent intervals before analysis
Validate StaME classification against known behavioral annotations when available

Protocol 2: Assessing Inter-Individual Interactions with Landscape Covariates

Purpose: To correctly identify whether correlated movement paths arise from social interactions or shared environmental responses by accounting for landscape heterogeneity.

Materials Required:

Simultaneous movement tracks for multiple individuals
Environmental GIS layers (habitat quality, resources, barriers)
R packages for step selection functions (amt, ResourceSelection) and spatial analysis (sp, raster)

Procedure:

Environmental Data Preparation: Process raster layers representing habitat quality, resources, and barriers relevant to the study species [5].
Step Selection Function Setup:
- For each observed step, generate 10-20 random available steps with matching step lengths
- Extract environmental covariates at the end point of both observed and available steps
- Calculate distances between individuals for each step option
Model Fitting: Implement three alternative modeling approaches:
- SSF-OD: Use occurrence distribution of other individuals as a covariate [5]
- SSF-DIST: Use distance to other individuals as a covariate [5]
- Spatial+: Apply Spatial+ method to reduce bias from unmeasured spatial factors [5]
Model Comparison: Evaluate models using AIC or cross-validation to identify the best-performing approach.
Dynamic Interaction Analysis: As a comparative baseline, calculate the Dynamic Interaction (DI) index using the WildlifeDI R package without environmental covariates [5].
Bias Assessment: Compare interaction inferences between models with and without environmental covariates to quantify potential spurious correlation.

Interpretation Guidelines:

Significant attraction/avoidance in SSF models with environmental covariates suggests true social interactions
Significant effects that disappear when adding environmental covariates indicate shared environmental responses
Spatial+ models should show reduced bias when relevant environmental predictors are missing

Protocol 3: Evaluating Model Fit and Predictive Performance

Purpose: To assess how well movement models capture observed patterns and generalize to new data using multiple validation metrics.

Materials Required:

Fitted movement models (Step Selection Functions, Hidden Markov Models, etc.)
Hold-out validation dataset not used for model fitting
Computing resources for cross-validation and simulation

Procedure:

Data Splitting: Reserve 20-30% of movement tracks as a validation dataset before model fitting.
Goodness-of-Fit Assessment:
- Calculate likelihood-based metrics (AIC, BIC) for fitted models
- Perform residual analysis where applicable
- Use approximate Bayesian computation (ABC) for complex models where likelihoods are intractable [59]
Predictive Performance Validation:
- Simulate movement paths from fitted models using the same starting points as validation data
- Compare simulated and observed distributions of:
  - Step-length and turning-angle distributions
  - Residence time distributions
  - Space use patterns (utilization distributions)
- Calculate dissimilarity metrics between observed and predicted distributions (Bhattacharyya distance, Earth Mover's Distance)
Cross-Validation: Implement k-fold cross-validation, ensuring entire tracks are kept together in folds.
Path-Level Predictions: Assess model ability to predict:
- First-passage time and residency patterns
- Encounter rates with environmental features
- Overall displacement distributions

Performance Benchmarks:

Well-fitting models should generate simulated paths with summary statistics within 10-15% of observed values
Predictive performance should be consistent across training and validation datasets
Models with Î”AIC < 2-4 relative to best model have substantial support

The Scientist's Toolkit

Table 2: Essential Research Reagent Solutions for Movement Ecology Analysis

Tool/Category	Specific Implementation	Function/Purpose	Application Context
Statistical Software	R programming language	Primary environment for statistical analysis and modeling	All analytical workflows [18]
Movement Analysis Packages	`ctmm` (continuous-time movement modeling)	Path reconstruction, home range analysis, habitat suitability estimation	Analyzing autocorrelated tracking data [18]
Movement Analysis Packages	`unmarked`	Fitting hierarchical models of animal abundance and occurrence	Site occupancy and count data modeling [18]
Movement Analysis Packages	`spmodel`	Spatial statistical modeling	Geostatistical analysis of movement patterns [18]
Movement Analysis Packages	`WildlifeDI`	Dynamic interaction analysis	Quantifying inter-individual interactions [5]
Step Selection Framework	`amt` (animal movement tools)	Step selection analysis, track manipulation	SSF implementation and path segmentation [5]
Spatial Analysis	`raster`, `sf`, `terra`	Processing environmental covariate data	Landscape heterogeneity analysis [5]
Simulation Platforms	Numerus RAMP technology	Building multi-modal movement simulators	Generating synthetic paths for hypothesis testing [8]
Data Sources	Movebank data repository	Accessing curated animal tracking data	Method development and comparative studies [18]

Analytical Workflow Integration

The following diagram illustrates the integrated workflow for assessing model fit and predictive performance in movement path analysis:

Figure 2: Comprehensive workflow for assessing model fit and predictive performance in movement ecology. The iterative nature of model refinement emphasizes the need for multiple assessment cycles to achieve optimal performance.

Robust assessment of model fit and predictive performance is essential for advancing movement ecology. The protocols outlined here emphasize the importance of hierarchical movement decomposition, proper accounting for landscape heterogeneity, and rigorous validation using multiple metrics. By implementing these standardized approaches, researchers can more confidently identify the mechanisms driving movement decisions, distinguish social interactions from environmental correlations, and develop models with greater predictive power. Future methodological development should focus on improving multi-scale analysis, integrating more complex behavioral mechanisms, and developing more efficient computational approaches for large tracking datasets.

Validating AI-Based Motion-Capture Systems for High-Resolution Behavioral Phenotyping

The accurate quantification of animal behavior is a cornerstone of biomedical research, movement ecology, and drug development. High-resolution behavioral phenotypingâ€”the precise measurement and interpretation of behavioral patternsâ€”allows researchers to link genetic, neural, and environmental factors to specific behavioral outputs. Recent technological advances have progressively shifted the field from traditional manual observation to automated, data-driven approaches. Among these, AI-based motion-capture systems have emerged as powerful tools for obtaining detailed, quantitative data on animal movement and behavior. This document outlines standardized protocols for validating these AI-driven systems to ensure they produce reliable, high-fidelity data suitable for high-stakes research applications, particularly within the framework of movement ecology statistical methods that focus on understanding individual variation in movement behaviors [60].

The transition from marker-based to markerless tracking represents a significant paradigm shift in behavioral analysis. While marker-based systems (MBMC) have long been considered the gold standard for kinematic studies, their implementation in small animals like mice has been limited by technical challenges related to marker attachment and potential behavioral interference [61]. Conversely, markerless motion capture utilizing computer vision and deep learning has democratized access to detailed behavioral analysis but introduces new validation challenges related to accuracy, reliability, and context-dependency [62] [61]. This document provides a comprehensive validation framework to address these challenges, ensuring that AI-driven systems meet the rigorous demands of modern behavioral research.

Validation Framework for AI-Based Motion Capture Systems

Core Performance Metrics and Validation Criteria

Establishing a robust validation framework requires defining and quantifying key performance metrics that collectively capture system capabilities and limitations. The table below outlines essential validation criteria, measurement approaches, and performance targets based on current literature and technological standards.

Table 1: Core Validation Metrics for AI-Based Motion Capture Systems

Validation Metric	Measurement Approach	Target Performance	Relevant Standards
Spatial Accuracy	Comparison against marker-based ground truth [61] or high-speed video reference	Mean error <10% of body segment length [61]	Consistent with movement ecology precision requirements [60]
Temporal Resolution	System recording capability versus observable behavior	â‰¥60 Hz for general locomotion; â‰¥100 Hz for fine kinematics [61]	Captures rapid behavioral transitions [63]
Inter-System Reliability	Cross-platform comparison on same subjects	CMC (Coefficient of Multiple Correlation) >0.8 [62]	Ensures reproducibility across labs
Signal-to-Noise Ratio	Analysis of trajectory smoothness	Minimize high-frequency "jitter" in trajectories [61]	Enables detection of subtle motor patterns
Behavioral Context Specificity	Performance across different behavioral states	Maintain accuracy across locomotion, rearing, grooming [64]	Supports comprehensive behavioral analysis

Quantitative Reliability Assessment

Recent studies provide specific quantitative benchmarks for AI-driven markerless systems. When evaluating pathological movement in Parkinson's disease patients, single- and multi-camera AI solutions demonstrated CMC values ranging from 0.53-0.92 for joint kinematics, with sagittal plane kinematics (like knee flexion-extension) typically showing higher reliability (CMC >0.91) than other movement planes [62]. The corresponding RMS (Root Mean Square) values for these measurements ranged between 6.94 - 12.91 degrees [62], providing important benchmarks for animal system validation.

For rodent studies, specialized systems like the JAX Animal Behavior System (JABS) and Goblotrop have demonstrated capabilities for high-resolution tracking. JABS provides an integrated hardware and software solution that enables uniform data collection compatible with machine learning algorithms, while Goblotrop utilizes infrared cameras and neural networks to determine a rodent's 3D position with sufficient accuracy to detect starvation-induced hyperactivity patterns comparable to running wheel measurements [64] [65]. These systems represent the current state-of-the-art against which new AI-based solutions should be compared.

Experimental Protocols for System Validation

Protocol 1: Ground-Truth Validation Against Marker-Based Systems

Purpose: To establish spatial accuracy and precision of AI-based markerless systems by comparison with marker-based motion capture (MBMC) as ground truth.

Materials:

Marker-based motion capture system (e.g., Qualisys, Vicon) [61]
AI-based markerless system to be validated
Experimental subjects (recommended: 8-12 mice, 10-12 weeks old) [61]
Multi-dimensional arena enabling various behaviors [61]

Procedure:

Subject Preparation: Implant minimally invasive retroreflective markers at key anatomical landmarks (e.g., joints, body trunk) using established surgical protocols. Allow 11-14 days for recovery before testing [61].
System Synchronization: Temporally synchronize marker-based and markerless systems using either hardware triggers or post-hoc timestamp alignment.
Data Collection: Record each subject performing standardized behaviors in the following sequence:
- Open-field exploration (10 minutes)
- Vertical climbing (5 minutes)
- Traversed narrow bridge (3 minutes)
- Grooming sequence (capture naturally occurring)
Data Processing:
- For marker-based data: Apply gap-filling algorithms as needed; no extensive post-processing should be required [61].
- For markerless data: Process through the system's standard pipeline without custom corrections.
Analysis: Calculate agreement metrics between systems for:
- Trajectory similarity: Dynamic Time Warping (DTW) on limb and body paths
- Spatial accuracy: Mean absolute error (MAE) in mm for corresponding points
- Behavioral classification concordance: Cohen's Kappa for agreement on behavioral state annotation

Interpretation: A well-validated system should maintain spatial accuracy with mean errors <10% of body segment length across all behavioral contexts and high behavioral classification concordance (Kappa >0.8) [61].

Protocol 2: Behavioral Flow Analysis Validation

Purpose: To validate whether the AI system can detect subtle treatment effects through behavioral flow analysis, which examines transitions between behavioral states.

Materials:

AI-based motion capture system undergoing validation
Experimental groups with known behavioral differences (e.g., stress models, drug treatments) [63]
Computational resources for behavioral flow analysis

Procedure:

Video Acquisition: Record control and experimental groups (minimum n=12 per group) under standardized conditions.
Pose Estimation: Use the system's pose estimation algorithms to extract body keypoints across all video frames.
Behavioral Clustering:
- Compute features from pose data over sliding time windows (Â±15 frames) [63]
- Apply k-means clustering to identify discrete behavioral states
- Determine optimal cluster number (typically 25-70) representing 95% of behavioral frames [63]
Behavioral Flow Mapping: For each animal, document all transitions between behavioral clusters across the recording session.
Statistical Validation:
- Implement Behavioral Flow Analysis (BFA): Compute Manhattan distance between group means across all behavioral transitions [63]
- Use permutation testing (â‰¥1000 iterations) to establish significance of group differences
- Compare results to traditional behavioral analysis (e.g., time in center, distance moved)

Interpretation: A validated system should detect known group differences with higher statistical power using BFA compared to traditional analysis methods, successfully identifying altered behavioral transition patterns that correspond to treatment effects [63].

The following diagram illustrates the complete validation workflow, integrating both ground-truth and behavioral analysis approaches:

Essential Research Reagents and Solutions

Successful implementation of AI-based motion capture validation requires specific hardware, software, and analytical tools. The following table details key components of a comprehensive behavioral phenotyping toolkit.

Table 2: Research Reagent Solutions for Behavioral Phenotyping Validation

Category	Specific Tool/Resource	Function/Purpose	Example Implementation
Hardware Platforms	JABS Data Acquisition Module [64]	Standardized hardware for uniform data collection	Open-source 3D designs for controlled environment
Hardware Platforms	Goblotrop Infrared System [65]	24/7 tracking in home-cage environment	Dual IR cameras with 3D positioning
Software Solutions	BehaviorFlow Package [63]	Behavioral flow analysis pipeline	Transition pattern detection in open-field tests
Software Solutions	JABS-AL Module [64]	Active learning for behavior annotation	GUI for classifier training and validation
Analytical Frameworks	Variance Partitioning [60]	Quantifying individual vs. population variation	Mixed-effects models for behavioral plasticity
Reference Datasets	BXD RI Mouse Strains [66]	Genetically diverse reference population	High-throughput phenotyping with genetic controls
Validation Standards	Marker-Based Motion Capture [61]	Ground-truth kinematic assessment	Qualisys system with implanted markers

Integration with Movement Ecology Statistical Methods

The validation approaches described align with emerging frameworks in movement ecology that seek to understand among-individual variation in movement behaviors. By applying variance partitioning methods, researchers can decompose behavioral variation into its constituent parts: intrinsic among-individual differences (animal personality), reversible behavioral plasticity, and residual within-individual variation [60]. These approaches allow movement ecologists to address fundamental questions about individual variation in:

Behavioral types: Consistent differences in average movement behavior among individuals [60]
Behavioral plasticity: Individual differences in responsiveness to environmental gradients [60]
Behavioral predictability: Consistent differences in residual within-individual variability around mean behavior [60]
Behavioral syndromes: Correlations among different movement behaviors at the individual level [60]

AI-based motion capture systems, when properly validated, provide the high-resolution, longitudinal data required for these sophisticated statistical approaches. For example, the worked example with African elephants mentioned in the movement ecology literature [60] demonstrates how individual differences in movement (average speed, adjustment rates, and predictability) can be quantified using mixed-effects modelsâ€”similar approaches can be applied to laboratory models using the validation frameworks described herein.

This document presents comprehensive application notes and protocols for validating AI-based motion capture systems in high-resolution behavioral phenotyping. By implementing these standardized validation procedures, researchers can ensure their systems generate reliable, reproducible data suitable for detecting subtle behavioral phenotypes, quantifying individual differences, and advancing both basic research and drug development applications. The integration of these technological advances with sophisticated statistical frameworks from movement ecology promises to unlock new insights into the biological underpinnings of behavior.

The selection of model organisms in biomedical research has historically been driven by practical convenience and scientific tradition rather than optimal relevance to human pathology. The house mouse (Mus musculus) currently dominates preclinical research, constituting nearly 49% of all animals used in European research in 2018 [67]. This predominance exists despite significant limitations in the clinical translatability of findings, potentially contributing to the high (>90%) attrition rate in drug development [67]. Concurrently, movement ecology has developed sophisticated statistical approaches to quantify how animals interact with their complex environments and each other. This article explores the translational opportunity: applying these advanced ecological methodologies to enhance the validity and predictive power of preclinical animal research.

Foundational Statistical Models from Movement Ecology

Movement ecology employs several robust statistical frameworks to analyze animal-environment interactions. The table below summarizes the key models relevant to preclinical translation.

Table 1: Core Statistical Models in Movement Ecology and Their Preclinical Applications

Model	Primary Function	Data Requirements	Key Preclinical Application	Considerations
Resource Selection Function (RSF) [4]	Estimates the relative probability of habitat use based on environmental covariates.	Animal locations ("used") vs. random points in home range ("available").	Identifying environmental features in a home cage or testing arena that animals seek or avoid.	Provides broad-scale habitat preference; can be implemented as an Inhomogeneous Poisson Point Process (IPP) [4].
Step-Selection Function (SSF) [4] [5]	Models the choice of each movement step based on local environmental conditions and movement constraints.	High-frequency relocation data, with random steps generated from a movement distribution.	Understanding how pharmacological or disease states affect immediate decision-making and interaction with environmental stimuli.	Accounts for movement autocorrelation; more dynamic than RSF.
Hidden Markov Model (HMM) [4]	Relates movement data (e.g., step length, turning angle) to discrete, latent behavioral states.	Time-series movement data at a fine temporal resolution.	Deconstructing complex behavioral sequences (e.g., exploration, grooming, social interaction) and how they are modulated by treatment.	Reveals how environmental covariates differentially affect distinct behavioral states [4].
Dynamic Interaction Indices [5]	Quantifies spatial-temporal correlation between movement paths of two or more individuals.	Simultaneous tracking of multiple individuals.	Measuring social approach/avoidance behaviors in a pair or group-housing context.	Can spuriously detect interaction if animals respond to the same unmeasured environmental feature [5].

Experimental Protocol: Integrating SSFs into Preclinical Behavioral Pharmacology

The following protocol details the application of a Step-Selection Function to assess how a candidate anxiolytic drug affects an animal's interaction with aversive stimuli in an open field test, moving beyond traditional simple metrics like time spent in the center.

Materials and Reagents

Table 2: Essential Research Reagents and Solutions

Item	Function/Description
Video Tracking System	High-resolution system (e.g., EthoVision, AnyMaze) for automated, high-frequency (e.g., 25 Hz) positional data collection.
Behavioral Arena	Standard open field apparatus (e.g., 40cm x 40cm x 40cm). A modified version with heterogeneous zones (e.g., bright vs. dark walls, textured floors) is preferred.
Pharmacological Agent	The drug under investigation (e.g., a GABA-A receptor modulator).
Vehicle Solution	Appropriate solvent (e.g., saline, DMSO) for preparing drug stock and serving as a vehicle control.
Statistical Software with SSF Capability	R statistical environment with packages such as `amt` [4] for SSF implementation and `momentuHMM` for HMMs [4].

Procedure

Experimental Groups & Dosing: Randomly assign subjects (e.g., C57BL/6J mice) to three groups:
- Group 1: Vehicle control
- Group 2: Low dose of anxiolytic candidate
- Group 3: High dose of anxiolytic candidate Administer treatments via the chosen route (e.g., intraperitoneal injection) 30 minutes prior to behavioral testing.
Behavioral Testing & Data Collection:
- Place a single animal in the center of the behavioral arena.
- Record a 10-minute trial using the video tracking system.
- Export the animal's trajectory as a high-resolution time series of X-Y coordinates.
Data Pre-processing: In R, use the amt package to:
- Import the trajectory data and create a track (amt::make_track()).
- Create observed steps between consecutive locations (amt::steps()).
- For each observed step, generate a set of 10-20 random steps (amt::random_steps()) that originate from the same starting point and match the observed step's turning angle distribution and step length distribution. This creates the "available" choices for each "used" step.
Environmental Covariate Extraction: For the end point of every used and available step, extract the values of relevant environmental covariates. For this example, covariates could include:
- Distance_to_Center: Euclidean distance from the arena center.
- Light_Intensity: A normalized value (0-1) based on the wall lighting.
- Zone_Risk: A categorical variable (e.g., 1 for "safe" near walls, 2 for "aversive" in center).
Model Fitting: Fit a conditional logistic regression model (amt::fit_ssf()) to the data, which is structured as a stratified case-control design. The basic model form is: logit(Used ~ Covariate_1 + Covariate_2 + ... + strata(step_id)) Where Used is a binary variable (1 for the observed step, 0 for random steps), and step_id is the stratification variable.
Interpretation: A positive selection coefficient (Î²) for Light_Intensity indicates the animal selects for brighter areas. The primary analysis would test if this relationship is significantly modulated by the Drug_Dose group, indicating a drug-induced shift in environmental preference.

Workflow Diagram

Critical Caveats and Methodological Considerations

The Risk of Spurious Inference

A critical lesson from ecology is that failing to account for key environmental variables can lead to profoundly misleading conclusions. A study simulating animal movement demonstrated that correlated movement paths between two individuals, which might be interpreted as "social interaction," can arise purely because both individuals are independently attracted to the same unmeasured resource [5]. In a preclinical context, this translates to a major caveat: what appears to be a direct drug effect on social behavior might instead be a downstream consequence of the drug altering an animal's perception of, or response to, a feature of its physical environment.

Table 3: Strategies to Mitigate Spurious Inference in Preclinical Models

Challenge	Ecological Insight	Preclinical Mitigation Strategy
Unmeasured Spatial Confounding	Ignoring landscape heterogeneity biases inference of social interactions [5].	Always measure and include relevant environmental covariates (e.g., light, texture, shelter) in SSFs.
Correlated Trajectories	Dynamic Interaction indices can falsely signal social attraction if animals share a resource [5].	Use SSFs that include both social (e.g., distance to conspecific) and environmental predictors simultaneously.
Limited Model Diversity	Over-reliance on a few traditional species (e.g., Mus musculus) limits biological insight [67].	Consider using monogamous rodent species (e.g., voles) for research on social bonding and its pathologies [67].

The "Spatial+" Solution

When a critical environmental covariate cannot be measured, ecologists have developed a bias-reduction technique called "Spatial+" [5]. This method can be adapted to preclinical settings. It involves adding a spatial smooth (e.g., the animal's X-Y coordinates) to the SSF to partial out the effect of unmeasured spatial dependencies, thereby reducing spurious correlations and yielding more accurate estimates of the true effects of interest, such as drug treatment or social interaction.

Application Notes: Enhancing Model Validity and Translation

Incorporating Environmental Enrichment and Complex Housing

The principles of environmental enrichment (EE) in preclinical research align closely with the complex landscapes studied in ecology. EE modifies an animal's daily environment to create richness in spatial, structural, and social opportunities, promoting engagement in species-typical activities [68]. Applying movement ecology models to animals in enriched versus standard housing can quantitatively show how complexity alters natural exploratory behaviors and space use, providing a richer, more ethologically valid baseline for evaluating therapeutic interventions.

Accounting for Developmental and Life History Stages

Ecological models are inherently sensitive to life history and state-dependent behaviors. This is crucial for preclinical translation, as treatment efficacy can vary dramatically with age. For instance, adolescent rodents show impairments in fear extinction compared to adults, and pharmacological adjuncts effective in adults often fail in adolescents, especially after stress [69]. SSFs and HMMs can be used to model how a drug's effect on movement and environmental interaction is conditional on the developmental stage of the animal, leading to more age-specific treatment strategies.

A Framework for Model Evaluation (OPE Protocol)

To ensure rigorous application of these complex models, researchers can adopt the OPE (Objectives, Patterns, Evaluation) protocol from ecology [70]. This framework mandates clear documentation of:

Objectives: The specific scientific or clinical question the model is addressing.
Patterns: The defined ecological (behavioral) patterns the model is designed to capture.
Evaluation: The methodology used to assess the model's performance and skill.

This promotes standardization and transparency, which is paramount when translating novel methodologies into the regulated domain of drug development.

Integrated Workflow for Preclinical Translation

The following diagram summarizes the complete translational pipeline from experimental design to clinical insight, integrating the concepts and methods discussed.

Conclusion

The statistical toolbox of movement ecology, comprising RSFs, SSFs, HMMs, and emerging frameworks like TEHS, provides powerful methods to transform raw movement tracks into profound insights about animal behavior, habitat use, and connectivity. The choice of model is critical, as each offers distinct advantages and answers different questions, from broad-scale habitat selection to fine-scale behavioral states. For drug development, these methods offer a rigorous framework for quantifying behavioral phenotypes in animal models, potentially increasing the translational predictivity of preclinical studies. Future directions will be shaped by the integration of movement models with AI-driven analytics, the development of more sophisticated multi-modal simulators, and the creation of stronger, formalized bridges between ecological theory and biomedical application, ultimately leading to more human-relevant research models.

Statistical Methods in Movement Ecology: From Animal Tracking to Biomedical Applications

Statistical Methods in Movement Ecology: From Animal Tracking to Biomedical Applications

Abstract

Deconstructing Animal Movement: From Raw Tracks to Ecological Insight

Core Components of the Movement Ecology Paradigm

Conceptual Framework and Interrelationships

Quantitative Assessment of Research Trends

Statistical Methods in Movement Ecology

Resource Selection Functions (RSF)

Step Selection Functions (SSF)

Hidden Markov Models (HMM)

Experimental Protocols and Application Notes

Integrated Workflow for Movement Analysis

Protocol 1: Habitat Selection Analysis using SSF

Protocol 2: Behavioral State Estimation using HMM

The Scientist's Toolkit: Essential Research Solutions

Advanced Applications and Future Directions

Core Concepts and Definitions

Analytical Framework and Workflow

Whole-Path Metrics for DAR Categorization

Experimental Protocols and Methodologies

Data Collection Requirements

DAR Categorization Protocol

Statistical Movement Elements (StaMEs) Extraction Protocol

Research Toolkit

Case Studies and Applications

Barn Owl DAR Categorization

Elephant Diel Movement Analysis

Baboon Diel Activity Patterns

Integration with Statistical Habitat Models

Introducing Statistical Movement Elements (StaMEs) as Building Blocks for Path Segmentation

Core Concept and Hierarchical Framework

Quantitative Framework and Data Presentation

Experimental Protocol: Implementing StaME-Based Path Segmentation

Protocol 1: From Raw Tracking Data to StaME Classification

Protocol 2: Hierarchical Segmentation into CAMs and BAMs

The Scientist's Toolkit: Essential Research Reagents

Integration with Established Movement Ecology Methods

Linking Movement Patterns to Ecological Processes and Fitness Outcomes

Model Descriptions and Mathematical Foundations

Resource Selection Functions (RSF)

Step Selection Functions (SSF)

Hidden Markov Models (HMMs)

Experimental Protocols and Workflows

Protocol 1: Fitting a Resource Selection Function (RSF)

Protocol 2: Fitting a Step Selection Function (SSF)

Protocol 3: Fitting a Hidden Markov Model (HMM)

Comparative Analysis and Data Presentation

The Scientist's Toolkit: Research Reagent Solutions

From Patterns to Processes: Linking Movement to Fitness

The Critical Role of GPS, Biologging, and Sensor Technologies in Data Collection

Application Notes: Current Capabilities and Research Outputs

Experimental Protocols

Protocol 1: Investigating Species-Habitat Associations

Protocol 2: Inferring Behavior and Fitness from Multi-sensor Data

Protocol 3: Wildlife Disease Surveillance through Movement Data

Workflow Visualizations

Biologging Research Data Pipeline

From Sensor Data to Behavioral States

The Scientist's Toolkit: Essential Research Reagents and Materials

Core Statistical Models: RSFs, SSFs, HMMs, and Their Practical Implementation

Theoretical Foundation and Mathematical Formulation

Experimental Design and Protocols

Study Design Considerations

Data Collection Protocols

Statistical Implementation Protocol

Advanced Applications and Integration with Movement Ecology

Comparison with Step-Selection Functions

Integration with Behavioral State Modeling

Contact-RSF Framework for Social Species

The Researcher's Toolkit

Fundamental Concepts and Analytical Framework

Core Mathematical Formulation

Integrated Step-Selection Analysis (iSSA)

Addressing Temporal Irregularity in Movement Data

The Challenge of Missing Data

Methodological Approaches for Irregular Data

Experimental Protocols for SSF Implementation

Data Preparation Workflow

Protocol for Basic Integrated Step-Selection Analysis