Hadoop is a great place to offload data for analytics processing or to model larger volumes of a single data source that aren’t possible with existing systems. However, as companies bring data from many sources into Hadoop, there is an increasing demand for the analysis of data across different sources, which can be extremely difficult to achieve. This post is the first in a three-part series that explains the issues organizations face, as they attempt to analyze different data sources and types within Hadoop, and how to resolve these challenges. Today’s post focuses on the problems that occur when combining multiple internal sources. The next two posts explain why these problems increase in complexity, as external data sources are added, and how new approaches help to solve them.
Read more about the author: