This is a short summary for the paper Human Motion Trajectory Prediction: A Survey by Andrey Rudenko et al.

Introduction

Application domains of human motion prediction.

With growing numbers of intelligent autonomous systems in human environments, the ability of such systems to perceive, understand and anticipate human behavior becomes increasingly important. Specifically, predicting future positions of dynamic agents and planning considering such predictions are key tasks for self-driving vehicles, service robots and advanced surveillance systems.

Understanding human motion is a key skill for intelligent systems to coexist and interact with humans. It involves aspects in representation, perception and motion analysis. Prediction plays an important part in human motion analysis: foreseeing how a scene involving multiple agents will unfold over time allows to incorporate this knowledge in a proactive manner, i.e. allowing for enhanced ways of active perception, predictive planning, model predictive control, or human-robot interaction. As such, human motion prediction has received increased attention in recent years across several communities.

The challenge of making accurate predictions of human motion arises from the complexity of human behavior and the variety of its internal and external stimuli. Motion behavior may be driven by own goal intent, the presence and actions of surrounding agents, social relations between agents, social rules and norms, or the environment with its topology, geometry, affordances and semantics. Most factors are not directly observable and need to be inferred from noisy perceptual cues or modeled from context information. Furthermore, to be effective in practice, motion prediction should be robust and operate in real-time.

We categorize the state of the art and discuss typical properties, advantages, and drawbacks of the categories as well as outline open challenges for future research. Finally, we raise three questions: Q1: are the evaluation techniques to measure prediction performance good enough and follow best practices? Q2: have all prediction methods arrived on the same performance level and the choice of the modeling approach does not matter anymore? Q3: is motion prediction solved?

Overview and Terminology

Stimuli Internal and external stimuli that determine motion behavior include the agents’ motion intent and other directly or indirectly observable influences.
Modelling Approach Approaches to human motion prediction differ in the way they represent, parametrize, learn and solve the task.
Prediction Different methods produce different parametric, non-parametric or structured forms of predictions.

Typical elements of a motion prediction system

We use the term agent to denote dynamic objects of interest such as robots, pedestrians, cyclists, cars or other human-driven vehicles. The target agent is the dynamic object for which we make the actual motion prediction. We assume the agent behavior to be non-erratic (非固定的) and goal-directed with regard to an optimal or near-optimal expected outcome. This assumption is typical as the motion prediction problem were much harder or even ill-posed (【计】不适定性) otherwise.

Taxonomy

Publications trends in the literature reviewed for thissurvey, color-coded by modeling approach. — Publications trends in the literature reviewed for this survey, color-coded by modeling approach.

Contextual Cues

Illustration of the basic working principle of the modeling approaches. (a) physics-based methods project the motionstate of the agent using explicit dynamical models based on Newton’s law of motion. (b) pattern-based methods learn prototypical trajectories from observed agent behavior to predict future motion. (c) planning-based methods include some form of reasoning about the likely goals and compute possible paths to reach those goals. In order to incorporate internal and external stimuli that influence motion behavior, approaches can be extended to account for different contextual cues. — Illustration of the basic working principle of the modeling approaches. (a) physics-based methods project the motion state of the agent using explicit dynamical models based on Newton’s law of motion. (b) pattern-based methods learn prototypical trajectories from observed agent behavior to predict future motion. (c) planning-based methods include some form of reasoning about the likely goals and compute possible paths to reach those goals. In order to incorporate internal and external stimuli that influence motion behavior, approaches can be extended to account for different contextual cues.

To begin with, (from the "Contextual Cues" persepective), based on their definations of all relevant internal and external stimuli, pervious studies can be roughly devided as follows:

Cues of the target agent include

Motion state (postion and possibly velocity)
Articulated pose (such as head orientation or full-body pose)
Semantic attributes (such as age and gender, personality, and awareness of the robot's presence)

Dynamic environment cues: (a) unaware, (b)individual-aware, (c) group-aware (accounting for socialgrouping cues, in green) — Dynamic environment cues: (a) unaware, (b) individual-aware, (c) group-aware (accounting for social grouping cues, in green)

With respect to the dynamic environment we distinguish

Unaware methods (compute motion predictions without consdering the presence of other agents)
Individual-aware methods (account for the presence of other agents)
Group-aware methods (account for the presence of other agents as well as social grouping informations)

Static environment cues: (a) unaware (ignoring any static objects, dashed line), (b) obstacle-aware (accounting for unmodeled obstacles, dotted line), (c) map-aware (accounting for a topometric environment model avoiding local minima, solid line), (d) semantics-aware (solid line).

With respect to the static environment we distinguish

Unaware methods (assume an open-space environment)
Obstacle-aware methods (account for the presence of indindividual static obstacles)
Map-aware methods (account for environment geometry and topology)
Semantics-aware methods (additionally account for environment semantics or affordances such as no-go-zones, crosswalks, sidewalks, or traffic lights, )

Modeling Approach

Physics-based methods (Sence-Predict)

Motion is predicted by forward simulating a set of explicitly defined dynamics equations that follow a physicsinspired model . A common form for is where is the (unknown) control input and the process noise. In fact, motion predcition can be seen as inferring and from various estimated or observed cues. Typically models use bulding blocks of a recursive Bayesian filter or multiple-model algorithm. Based on the complexity of the model, including:

1.1 Single-model methods, define a signle dynamic motion model, e.g. (Elnagar 2001; Zernetsch et al. 2016; Luber et al. 2010; Coscia et al. 2018; Pellegrini et al. 2009; Yamaguchi et al. 2011; Aoude et al. 2010; Petrich et al. 2013)
- Early works include some constant velocity models, constant acceleration models. Bicycle model is often used as an approximation to model the vehicle dynamics. Reference.
- A large number of works rely on kinematic models for their simplicity and acceptable performance under mild conditions such as tracking with little motion uncertainty and short prediction horizons, such as Kalman filter (KF) and its varaitions.
- Some works adopted autogressive models (ARM) which can accuount for the history of states into prediction. One of the work could be Zhu(1991), of which the ARM is used as transition function of a Hidden Markov Model (HMM).
- A number of approaches extend physics-based models to account for information from a map, particularly for the task of tracking ground vehicles on roads.
1.2 Multi-model methods, include a fixed or online adaptive set of multiple dynamics models and a mechanism to fuse or select the individual models, e.g. (Agamennoni et al. 2012; Pool et al. 2017; Kooij et al. 2019; Kaempchen et al. 2004; Althoff et al. 2008a; Gindele et al. 2010)

Pattern-based methods (Sense – Learn – Predict)

Approximate an arbitrary dynamics function from training data. These approaches are able to discover statistical behavioral patterns in the observed motion trajectories, including:

2.1 Sequential methods learn conditional models over time and recursively apply learned transition functions for inference, e.g. (Kruse and Wahl 1998; Kucner et al. 2017; Liao et al. 2003; Aoude et al. 2011; Keller and Gavrila 2014; Vemula et al. 2017; Alahi et al. 2016; Goldhammer et al. 2014)
2.2. Non-sequential methods directly model the distribution over full trajectories without temporal factorization of the dynamics, e.g. (Bennewitz et al. 2005; Xiao et al. 2015; Keller and Gavrila 2014; Tay and Laugier 2008; Trautman and Krause 2010; Kafer et al. ¨ 2010; Luber et al. 2012)

Planning-based methods (Sense – Reason – Predict) explicitly reason about the agent’s long-term motion goals and compute policies or path hypotheses that enable an agent to reach those goals, including:

3.1. Forward planning methods make an explicit assumption regarding the optimality criteria of an agent’s motion, using a pre-defined reward function, e.g. (Vasquez 2016; Xie et al. 2013; Karasev et al. 2016; Yi et al. 2016; Rudenko et al. 2017; Galceran et al. 2015; Best and Fitch 2015; Bruce and Gordon 2004; Rosmann et al. 2017)
3.2. Inverse planning methods estimate the reward function or action model from observed trajectories using statistical learning techniques, e.g.(Ziebart et al. 2009; Kitani et al. 2012; Rehder et al. 2018; Kuderer et al. 2012; Pfeiffer et al. 2016; Chung and Huang 2012; Shen et al. 2018; Lee et al. 2017; Walker et al. 2014; Huang et al. 2016)

This is a short summary for the paper Human Motion Trajectory Prediction: A Survey by Andrey Rudenko et al.

Introduction

Overview and Terminology

Taxonomy

Contextual Cues

Modeling Approach