Accurately resolved entities are critical for analytics because analysts won’t use data they don’t trust and aren’t willing to include in their reporting. Novetta Entity Analytics was designed to rapidly deliver highly accurate entity results and scale to handle billions of records across a Hadoop computing cluster.
Distributed Processing Returns Highly Accurate Results
High quality entities that meet the needs of analysts are key to the success of Novetta Entity Analytics. To achieve these results, the software rapidly groups records into meaningful entities using an iterative process that begins with creating candidate entity resolution rules and applying those rules across data sources. The results from candidate rules are then evaluated, measured, modified and rerun, sometimes multiple times, until the required entity quality is achieved. Novetta Entity Analytics can process billions of records in hours, which enables it to rapidly derive the optimal rule sets needed to resolve large volumes of data.
The software uses a unique distributed parallel processing methodology to partition data into small groups of records likely to be related to the same real-world entity. Novetta Entity Analytics efficiently distributes these small partitions across a Hadoop cluster, and rapidly performs entity resolution jobs on each partition. The resolved entities from all partitions are combined and automatically organized into a multidimensional, cross-system index that contains record identifiers for each entity across all data sources.
Fast Entity Analysis Delivers New Insights
Novetta Entity Analytics can rapidly post-process the combined data to gain a better understanding about a specific entity or groups of entities. For example, an analyst at a retailer with outlets nation wide, might want to identify customers with transactions over $100 in a specific city. Novetta Entity Analytics can perform analytics on the retailer’s resolved entity data and deliver those results in seconds. The system can also easily identify these types of customers at any number of locations across the U.S., not just in a specific city, because it has already resolved and combined all of the data.
Without access to accurately resolved entity data, locating these types of customers would be much more complicated. First an analyst would have to find all of the people in the specific location, and determine who had more than $100 in transactions. The analyst would then most likely have to review much of the data manually to account for many ambiguous customer references (e.g. names misspelled, initials vs. complete name, name token order and nicknames). This process could take hours, days or even weeks, depending the quantity and quality of the data being reviewed. With Novetta Entity Analytics, there is no need for manual review of individual records, and the number of records generated by queries is greatly reduced, since multiple records about a single entity are already resolved into one.
Scalable to Accommodate All Data Requirements
Built-in linear scalability to hundreds of computing nodes allows Novetta Entity Analytics to deliver fast performance on the largest data sets. Linearly scalable means when computing capacity is doubled execution takes Novetta Entity Analytics half the time on the same amount of data, or the same time on twice as much data.
The ability of Novetta Entity Analytics to accurately, rapidly and automatically resolve and organize entities within large quantities of data at scale also helps organizations leverage their human talent. Since analysts are no longer required for these tasks, they are free to review and analyze the richer data sets and ask more interesting questions. Novetta Entity Analytics makes this possible for a wide range of customers, from those with a few servers and 100 million records to organizations with hundreds of servers and billions of records.