Novel biometric technologies are receiving an increasing amount of public attention and news coverage, particularly given the strong uptick in diverse mobile and wearable biometric applications. At a glance, media reporting can often be overly generous with respect to the capabilities offered by these technologies. In February, for example, headlines proclaimed, “This Woman’s Fitbit Knew She Was Pregnant Before She Did,” and “Fitbit Fitness Tracker Detects Woman’s Pregnancy.”
Primed as we are by [take your pick of influencers] – Hollywood, science fiction, high expectations – it’s easy to gloss over the fact that Fitbit, in this case, did not diagnose a pregnancy; rather, it reflected an increase in resting heart rate, which a human later associated with gestation. When considering the performance of consumer grade technologies that rely on biometric analysis, it’s necessary to keep in mind that deriving diagnostic-quality conclusions, without a human in the loop, still presents a significant gap to navigate.
One fundamental reason for this gap is the difficulty involved in obtaining data that’s suitable for advanced biometric modeling and testing. Whether operating in commercial, academic, or government sectors, researchers have two primary options: use existing datasets assembled by prior researchers (likely for a somewhat different purpose), or compile a custom dataset. Both options present conspicuous difficulties; consequently, researchers are turning to more collaborative approaches in order to drive data availability and subsequent technological advancement.
Using Existing Data
The use of existing biometric data that has been collected by another entity may present certain challenges or limitations to R&D activities, with respect to technical, legal, administrative, and other (e.g. ethical) considerations.
At the outset, it’s often difficult to identify data options that are demonstrably useful. Survey publications on novel biometric systems – take your pick of anatomical, physiological, behavioral, or cognitive modalities – most provide minimal reporting on involved datasets, rarely covering the full spectrum of key details necessary to understand the utility of the data (e.g. thorough documentation of temporal factors, test population demographics, collection environment and materials, compliance with regulations – just to name a few). It’s difficult to be overly critical of this trend, as dataset documentation is time consuming and perhaps a luxury in the case of many smaller-scale studies, particularly when the greater emphasis lies on study outcomes. However, lack of reporting increases the time and effort future researchers must devote to vetting existing dataset options.
If an existing dataset can be determined to offer utility, then additional steps arise – navigating administrative and legal agreements, addressing potential licensing fees, and ensuring citation of data sources in resulting publications.
NIST Biometric & Forensic Research Database Catalog
The problems associated with awareness, vetting, and sharing existing datasets are increasingly familiar across research sectors in the novel biometrics space. In 2015, the National Institute of Standards and Technology (NIST), working in conjunction with the National Institute of Justice (NIJ), launched an online Biometric and Forensic Research Database Catalogue to serve as a centralized, searchable resource for publicly available biometric and forensic datasets worldwide. Currently featuring 299 databases, the Catalog remains under development and is expanded regularly with the addition of new datasets stemming from a variety of related fields.
Collecting Custom Data
Researchers can alternatively aim to compile study-specific biometric datasets. However, assembling a sufficiently large and representative dataset can rapidly become a complex and challenging endeavor. Running a sizeable data collection can involve unexpectedly high resource costs – personnel, equipment, establishing & maintaining a test environment, expenses for materials and supplies, and let’s not even get into the difficulties associated with participant recruitment. Major decisions must be made – how will collections be planned, managed and executed? How appropriate is the collection sensor suite? How, administratively, can the study handle extended collections or repeated visits? How can necessary steps be implemented to document participant consent and satisfy privacy assurance compliance regulations? The list goes on.
In short, gathering data needed to thoroughly and appropriately test novel biometric systems often becomes a project in and of itself. R&D organizations that lack plentiful funding and supporting resources are virtually unable to undertake this activity – forcing reliance on existing data that may be a poor fit for current R&D purposes, or recourse to extremely small collections that fall short of providing the big picture view needed to rapidly drive technology advancement.
Collaborative Data Collection Efforts
Given the resource-intensive nature of biometric data collections, pursuing a collaborative approach for compiling new datasets helps reduce and diffuse costs, ensures that utility will be provided to multiple stakeholders, and provides opportunities for additional research.
Novetta has been fortunate to partner with the U.S. Military Academy (USMA) at West Point in collecting customized data to satisfy requirements involved with behavioral and cognitive biometric R&D. Novetta draws on decades of experience and expertise with biometric data in order to help shape collections plans, while USMA provides controlled collections facilities, skilled personnel experienced with government-regulated studies involving human data collection, and the participants – Cadets who are able to opt-in to collections in return for class credit.
In 2015, in conjunction with the Defense Advanced Research Projects Agency (DARPA), and several performers involved in DARPA’s Active Authentication Program (Phase 1), Novetta and USMA completed a research study entitled “Active Authentication through Keystroke Dynamic Biometrics, Behavioral Web Analytics, Power Consumption, and Masquerader Detection.” This study included the collection and development of a behavioral and cognitive biometric database featuring keystroke, mouse, system activity, and web browsing data from both genuine and impostor users, that is compliant with all federal and academy level regulations, and which was conducted with Institutional Review Board (IRB) approval.* It also facilitated opportunities for student-led research, resulting in an award-winning CAPTSTONE project.
Another data collection effort is currently underway, this time targeted at identifying whether computer-based behavioral and cognitive biometric modalities can indicate user stress (a number of physiological biometric sensors are also involved in order to validate the occurrence of stress). Abnormally high user stress levels are of interest as potential correlates to malicious behaviors, such as acts of workplace violence or insider threat activities.
Both of these data collections constitute sizeable efforts that require months of planning, weeks of collections, and extensive post-processing and documentation – but forming these data sets serves as a necessary starting point to drawing actionable insights into the operational utility of novel behavioral and cognitive biometric applications.
Given the resource-intensive nature of novel biometric data collections, it would be a benefit in this space to see more shared collections efforts (resulting in publically available data that is cleaned, documented, and free of charge) for use in R&D as well as performance benchmarking. There’s a drive to widely deploy all sorts of biometric technologies in a number of use cases that is growing exponentially – and the need for suitable data increases accordingly.
To learn more about the database for Active Authentication or request a copy, please contact Research Associate Alysse Pulido (firstname.lastname@example.org).