Novetta has been exploring ways to uncover patterns in maritime vessel traffic around the globe to help analysts flag and predict suspicious behaviors and anomalies. The effort began as an R&D program in which a customer challenged us to identify specific types of vessels engaged in specific types of activities. To do this, we developed the Trajectory Data Mining Framework (TDMF), a distributed application that operates on Automatic Identification System (AIS) trajectory data. AIS is a tracking system that has vessels broadcast sequences of time-stamped coordinates to aid safe navigation. Most commercial vessels are expected to broadcast AIS, however some do not, and some alter their AIS identification to misrepresent themselves. These aspects of AIS make this type of pattern discovery both challenging and interesting. This original use case analyzed AIS data, but TDMF is designed to help analysts generate insights from any trajectory data.
TDMF For Vessel Classification
Using TDMF, we are able to infer many interesting details about vessels from trajectory data alone. The system first runs a sequence of detectors to capture features across a range of classes, including kinematic, temporal, and spatial. In order to do this, TDMF first classifies vessels by size (small, medium, and large), as our earliest experiments with trajectory data showed that vessel size is a key indicator for behavior detection modeling. For example, the table below lists the most informative features from a TDMF vessel classification. At the top of the list are the one-hot features representing vessel size. The features MediumCargo, SmallPleasure, LargeCargo, LargePassenger, LargeInland, SmallFishing, Tug, and LargePleasure are spatial features that represent the degree to which the vessel trajectories were present in the areas of the region where those types of vessels were most common. The discovery that spatial features like these were significant in the classification of vessel behavior was one of the most important breakthroughs in the development of TDMF.
FIGURE 1: Output from TDMF vessel classification. The importance and relative importance of each feature is given. This data is scaled to make the most important feature have a value of 1.
TDMF delivers the behavior prediction as a probability vector, with one entry per vessel type, providing a profile of how the vessel behaved. The vessel entry with the highest probability can be used as a vessel type prediction, which is how we answered the original challenge posed to us by our customer: “find instances of these types of vessels.” For example, Figure 2 depicts the fishing vessel trajectory findings from a behavior classification across a region of approximately 600 square kilometers. The known fishing vessel trajectories are shown in green, and trajectories whose classified behavior most resembled fishing vessels (i.e., the predictions) are in red. Among the most encouraging artifacts visible in that diagram are the extent to which the algorithm appears to have learned from the back-and-forth effects apparent in some of the known trajectories at sea and the amount of predicted trajectories not immediately near known trajectories.
FIGURE 2: Fishing vessel trajectory prediction. Known trajectories are in green; predicted trajectories are in red. The filaments around the perimeter are waterways.
Using Trajectory Data at Scale
The detectors that generate size and behavior features aggregate data on a spatial scale of approximately 1 square kilometer and a time scale of 4 hours, though these can be generalized based on use case and available compute resources. The real power of the analytics learned from this data is apparent when you consider data at scale. In a large region with an area of nearly a quarter of a million square kilometers over a period of one year, TDMF generated behavior predictions and other analytics for more than 105,000 vessels, resulting in a pattern of life rich enough for any number of further analytics, including port modeling and specialized anomaly detection.
For example, consider the Diurnal Activity graph below. The horizontal axis represents time, an average day from the same region for which we predicted behavior above, with a dotted line at noon and the “nighttime” hours in gray. We are able to discern a lot of information right away. For example, the blue, mounded line represents the activity of small pleasure vessels, peaking around mid-morning. The higher two lines represent the activity of the medium and large cargo vessels, exhibiting minor peaks in the nighttime hours, perhaps avoiding that pleasure vessel traffic.
FIGURE 3: Activity graph of vessels where the horizontal axis represents time of day, and the vertical axis represents activity level.
We refer to the processes in which we operate directly on vessel trajectories to detect features for learning as direct learning. For direct learning, TDMF relies on trajectories whose posits are relatively close in time—within 30 minutes by default—which we call focused trajectories. Focused trajectories usually only account for part of the trajectories in a region, however. There are usually many other trajectories whose posits are more than 30 minutes apart—sometimes hours or days apart—that we call fuzzy trajectories. We cannot use direct learning to make predictions about fuzzy trajectories; however, emboldened by the patterns we found in developing the direct analytics, we crafted an indirect learning protocol to try to shed light on fuzzy trajectories.
TDMF indirect learning creates features from the information in the space and time around focused posits. And after having performed direct learning, TDMF gives us a good idea of what sort of vessels and activities are present in that space and time. It turns out that there are strong patterns in this data, strong enough that we can predict and assign behavior probability vectors to fuzzy trajectories just as we do for focused trajectories. In fact, for even the shortest fuzzy sequences—as short as a single posit, which we call a fleeting point—TDMF can generate excellent predictions, with accuracies often greater than 70%. This is exciting as it allows us to explore several further applications. For example, we could treat the locations of images from a collection of satellite imagery as fleeting points and generate a behavior probability vector for each location. For machine learning practitioners working to classify vessels from imagery, our vectors would be helpful in bringing information about the pattern of life in the area.
The rich pattern of life afforded by TDMF serves as an ideal platform for anomaly detection. For example, we have begun exploring the possibility of using supervised techniques, focused on rare but significant, perhaps dangerous, events in the data to develop a predictive capability for such things in the future. Such analytics could aid safety of navigation and warn of disruptive events such as vessel interdictions.
At the request of the Novetta’s Machine Learning Center of Excellence, the next phase of this research explores the new RAPIDS GPU libraries from NVIDIA and their applicability for geospatial data processing.