In my last post, I introduced you to Novetta Entity Analytics and how companies can use it to achieve specific business goals by connecting and analyzing internal and external data sources stored in Hadoop. In this post, I’ll discuss five unique capabilities in Novetta Entity Analytics that enable advanced analytics that weren’t possible before.
First and foremost, Novetta’s source-agnostic entity model ensures that any data source registered in Hadoop’s HCatalog can be loaded into Novetta Entity Analytics for analysis. For unstructured and semi-structured sources not already registered in HCatalog, the software performs an entity extraction process to convert the text and metadata within these sources into a structured format, and registers them in HCatalog for use by Novetta’s resolution engine. During the entity extraction process, Novetta Entity Analytics applies annotation dictionaries to text and metadata from these unstructured and semi-structured sources to identify attributes within the data, and invokes a series of algorithms and rules to form entity-record fragments from the attributes. Novetta’s entity resolution engine then links these fragments back to other data sources contained within the cluster.
Novetta Entity Analytics publishes combined entities and entity relationships back to HCatalog for use by other applications. Post-resolution classification and summarization processes within Novetta Entity Analytics can also use these enriched profiles and relationships to detect behavior patterns or trends. The behavior patterns and trends help the processes identify subsets of particular individuals, groups, organizations, or channels, define an authoritative view, or distill data for other uses.
Novetta Entity Analytics applies various data characterization techniques to the data throughout its data analysis and entity resolution processes to improve extraction, resolution and linking outcomes. During entity extraction, data is characterized to determine the appropriate fit for a particular annotated data value. An example would be determining whether a string of numbers is more likely to be a phone number or an identification number. During profiling, the software characterizes data columns to assist users in mapping source data into Novetta’s source-agnostic entity model, which is important because source systems can have vastly different data models and taxonomies. It also validates data within columns, which is needed when data is malformed or improperly labeled.
Novetta Entity Analytics standardizes the mapped raw data into a single optimized table to be used for data correlation during the entity resolution process. The software then performs statistical characterization of the underlying data to determine how best to group individual records into real-world entities across the mapped sources. It also provides users with an understanding of the data distribution and uniqueness that enables them to appropriately set thresholds for strategy matching rules applied during the entity resolution and linking processes.
Through its understanding of relationships between entities, Novetta Entity Analytics provides highly accurate entity resolution, and contextual and transactional awareness, within and between groups of related entities. Leveraging relationships between entities allows Novetta to provide much greater accuracy when resolving entities, which is important because two individuals may share the same name, date of birth, and city of birth, but have different mother, father, spouse or employer relationships. These differences become critically important as data becomes more fragmented and when dealing with homogeneous sources with low cardinality.
Novetta Entity Analytics also uncovers and leverages relationships between individuals, organizations, locations and events to further improve its entity resolution process. The software provides context about these relationships and hierarchies to enable users to better determine customer roles within formal or informal groups. In addition, Novetta Entity Analytics understands the transactional relationships and context between customers, which can be leveraged by users to target specific groups based on behaviors and complex relationships.
The built-in intelligence and knowledge transfer discussed above provide analysts with self-service capabilities to allow them to perform tasks that would normally require data integrator or data scientist resources. In addition, Novetta Entity Analytics continuously enriches and enhances the HCatalog data sources it uses to extend knowledge transfer to other business intelligence, analytics and enterprise applications. Enterprise-quality dashboards can leverage and display data characterization and profiling information to provide governance insights into cluster data quality. Third-party visualization and search applications can further data discovery by exposing the resolved entities and relationships in different ways.
Finally, Novetta Entity Analytics is built to meet all customer data requirements and respond to advanced analytical queries in seconds. The software is linearly scalable and performs parallel processing on systems that range from small clusters with two or three data sources and a few million records, to those with thousands of nodes, many data sources and trillions of records. When choosing Novetta Entity Analytics for advanced analytics on Hadoop, organizations can be assured they can begin with a small number of data sources and records, and the system will scale with them to meet the growing number of data sets, records and analytics they will require over time.
Novetta Entity Analytics provides companies with unparalleled clarity into their enterprise data and the actionable business intelligence they need to better understand and connect with customers. Stay tuned for future posts, where we’ll discuss the five important capabilities above in more detail and why they are critical for meeting most enterprises advanced analytic needs.
Click the links below to listen to our What Makes Novetta Entity Analytics Unique podcast series where I discuss these components in further detail.