Machine Learning on AWS

State-of-the-Art Machine Learning
Built on Amazon SageMaker

Machine learning is having a transformative impact on the advanced analytics landscape, delivering breakthroughs in applications ranging from face recognition to predictive modeling. To truly harness the power of machine learning for real-world applications – for training, model development, and performance optimization – proven experience with massive datasets (on the order of billions of records) is necessary.

Working with state-of-the-art machine learning platforms such as Amazon SageMaker, we develop machine learning solutions to address advanced analytics challenges and deliver unique insights. As an AWS Advanced Tier Partner, Novetta leverages Amazon SageMaker for its ease of use, flexibility, tight integration with the AWS ecosystem, and reduced time-to-deploy. Building on the full suite of libraries and algorithms available through SageMaker, we have improved and simplified the process of deployment across our portfolio of machine learning solutions.

Our solutions extract meaningful content from structured and unstructured data, empowering analysts to deliver intelligence and insights. Our data scientists address complex imagery analysis challenges using neural network and deep learning algorithms through MXNet and TensorFlow. We identify opportunities where machine learning tools can complement or replace conventional, rules-based, or analyst-dependent functions to increase the scale, speed, and accuracy of large-scale data analytics. We also conduct rigorous benchmarking and performance evaluations to identify machine learning approaches, platforms, and tools that are best-suited for customer needs.

Our experience and expertise span machine learning libraries (scikit-learn, xgboost) and deep learning libraries (TensorFlow, Keras, MXNet, Gluon) deployed on AWS SageMaker; state of the art Neural Network architectures (e.g. CNNs, RNNs); and methods and techniques for regression, classification, clustering, and dimensionality reduction.

OUR CAPABILITIES

Text and Entity Analytics

The ability to extract meaning from population-sized datasets is fundamental to defense and intelligence applications. We are integrating machine learning within our industry-leading entity analytics platforms to drive speed and accuracy. Machine learning empowers our data scientists to deliver more robust behavioral analytics, anomaly detection, and time-series analysis products.

Image and Sensor Analytics

Machine learning and deep learning are driving breakthroughs in the exploitation of low-resolution, noisy image and sensor data. We apply custom machine and deep learning techniques to real-world image and sensor data – including biometrics, geospatial, C2, and video – for applications such as detection, tracking, classification, and recognition.

Text and Entity Analytics

Image and Sensor Analytics

The ability to extract meaning from population-sized datasets is fundamental to defense and intelligence applications. We are integrating machine learning within our industry-leading entity analytics platforms to drive speed and accuracy. Machine learning empowers our data scientists to deliver more robust behavioral analytics, anomaly detection, and time-series analysis products.

Machine learning and deep learning are driving breakthroughs in the exploitation of low-resolution, noisy image and sensor data. We apply custom machine and deep learning techniques to real-world image and sensor data – including biometrics, geospatial, C2, and video – for applications such as detection, tracking, classification, and recognition.

AWS-Based Solutioning

As an AWS Advanced Tier Consulting Partner, we deliver cloud-based machine learning solutions using Amazon SageMaker. By offering on-demand access to a wide range of algorithms and platforms, SageMaker dramatically simplifies implementation of machine learning for the majority of our customers that rely on AWS services such as EC2 and S3.

Deep Learning Frameworks & Architectures

Deep learning is among the most promising yet rapidly-changing machine learning domains, though specialized expertise is necessary to fully leverage its potential. We apply deep learning to traditional domains, such as image and video analysis, as well as more cutting-edge areas, such as NLP and structured data analysis.

AWS-Based Solutioning

Deep Learning Frameworks & Architectures

As an AWS Advanced Tier Consulting Partner, we deliver cloud-based machine learning solutions using Amazon SageMaker. By offering on-demand access to a wide range of algorithms and platforms, SageMaker dramatically simplifies implementation of machine learning for the majority of our customers that rely on AWS services such as EC2 and S3.

Deep learning is among the most promising yet rapidly-changing machine learning domains, though specialized expertise is necessary to fully leverage its potential. We apply deep learning to traditional domains, such as image and video analysis, as well as more cutting-edge areas, such as NLP and structured data analysis.

CUSTOMERS

Image for
Image for
Image for
Image for
Image for
Image for
Image for
Image for

USE CASES

Novetta Media Analysis Quote Extraction Built on SageMaker

As part of our Novetta Media Analytics (NMA) platform, we collect broadcast, print, and web news in multiple countries. This data is analyzed to determine sentiment, from which we develop actionable intelligence. Novetta has replaced manual, time-consuming tagging with NMA Quote Extraction, our machine-learning based engine that processes articles and suggests attributable statements.

The use of SageMaker for this project is critical to its success. The heart of this service is a model hosted on SageMaker infrastructure that was trained using a custom Convolutional Neural Network architecture. Training was accomplished using resources provisioned through SageMaker and made possible by the easy parallelizing and simple provisioning of GPUs. The combination of ease and speed enabled the rapid experimentation and validation necessary to develop a model that performs to the production standard.

The custom architecture takes advantage of a Global Vectors for Word Representation (GloVe) model to generate an embedded representation of the free text that is used as input to the convolutional feature extraction layers. Implemented with a combination of MXNet and its imperative framework Gluon, this model produces recommendations that greatly improve the curation of this natural language data set.

Image for Novetta Media Analysis Quote Extraction Built on SageMaker
Image for Semantic Vector Powered on AWS

Semantic Vector Powered on AWS

Non-English corpora, particularly technical corpora, can be mined for insights for various intelligence missions. Because translators are high-demand, low-density assets, our customers needed an automated, scalable tool to prioritize and triage such corpora.

Our solution, Semantic Vector, uniquely combines unsupervised machine learning with crowdsourced ontologies to appropriately tag foreign-language documents with English concepts. By making foreign-language documents accessible without the need for translators, Semantic Vector dramatically expands the range of data that we can ingest into our mission analytics and entity resolution systems. Machine learning adds value by automating the discovery of connections and similarities within a large data set, leading to richer, more accessible results.

The Semantic Vector pipeline implements unsupervised machine learning through a combination of term frequency–inverse document frequency (TF-IDF) document representation and a customized Lucene scoring model inherent to Elasticsearch. Leveraging a Spark cluster deployed on AWS gives our developers and data engineers the flexibility to experiment with the conceptual scoring model and data processing. The scoring and tagging uses a vectorization of the non-English corpora measured against a language-specific ontology. This vectorization is accomplished using a customized Lucene scoring function based upon the TF-IDF / bag-of-words paradigm. Documents are indexed with concepts in an Elasticsearch backend, enabling sub-second search returns even using free-text queries. Data ingest and ETL is performed using Spark (both PySpark and Scala).