Novetta began developing entity analytics technology more than 15 years ago to help our government customers better understand their data. These customers had many large data sources that were growing quickly and wanted to be able to rapidly and accurately use the data for analysis without requiring significant human interaction. Novetta spent years counting, measuring and analyzing many data types in interesting ways, and used this knowledge to create proven, statistically sound, empirical methods for processing and combining data from many sources. The result is Novetta Entity Analytics software.
Source-Agnostic Methods Combine Many Data Sources
Novetta Entity Analytics uses powerful source-agnostic methods to combine data from multiple sources in Hadoop. Whether data is from different applications or services, or in different structured, semi-structured or unstructured formats, Novetta Entity Analytics can process and combine it, and make it available for use by other applications and services. The software imports any data source that is already part of HCatalog and provides the ability for users to add new data sources as needed.
Novetta Entity Analytics uses a services layer to simplify the reading, writing and normalization of data as it comes into the system. When integrating new data sources, the software measures and characterizes the data to determine its uniqueness and applicability to the resolution process. Resolution rules, derived from built-in, customizable templates, characterize the measurements to determine if records contain sufficient information to deduce they are referencing the same real-world entity.
Flexible Entity Resolution Model
Novetta Entity Analytics includes a flexible entity resolution model designed to resolve more than just people. The software can be used for people, organizations, locations or any other type of real-world entity. It can also detect relationships between entities, including common shared information, observed interactions, and potentially collaborative behaviors. Stay tuned for a future blog with more details about how Novetta Entity Analytics handles relationships.
Additionally, since the information within semi-structured and unstructured data sources is not presented as directly as structured data, Novetta Entity Analytics provides two options for handling these types of sources. The software either extracts attributes from text and metadata and performs entity resolution on the extracted fragments, or links semi-structured and unstructured documents to the appropriate resolved entity record.
Novetta Entity Analytics employs the same entity resolution process to merge different structured, semi-structured and unstructured data sources using whatever attributes are present in the data. The software can link rich data sources with sources containing less information content. For example, a structured entity record, such as a driver’s license record with name, date of birth, address and other personally identifying information, can be linked to an unstructured document containing the same name and address.
Proven Empirical Methods Ensure Accurate Results
Novetta Entity Analytics includes proven empirical methods that leverage categorization processes to identify common names, locations, phone numbers, etc. Rules are created to account for commonalities and ensure the most accurate results possible. For example, if a new record is introduced for a person with an extremely common name like Mike Smith and an address in a large apartment building or a common phone number for a large organization, the software automatically determines what other data is needed to accurately combine the new record with an existing Mike Smith entity.
Once Novetta Entity Analytics has created candidate rule sets and performed entity resolution on the data, a human analyst can evaluate a statistically significant sample to determine how well the rules performed and how they can be improved to increase accuracy. Rules can then be modified and evaluated as much as needed until the resolution process is as accurate as possible.
Entity Resolution is Just the Beginning
At the end of the day, entity resolution of many different types of data sources is about accurately grouping records that reference the same real-world entity. At Novetta, we view entity resolution as just the beginning – the first step our software must take to enable more actionable business decisions. Once Novetta Entity Analytics has merged and linked all of the data sources together, it creates a multidimensional entity index that maps records to entities based on the data’s content and entity attributes. The combined index and data from Novetta Entity Analytics can empower other analytic tools and applications to perform analysis on resolved entities instead of on unlinked data from multiple sources. The result is organizations benefit from a much cleaner picture of their data and more actionable intelligence.