Big data analytics- working

In some cases, Hadoop clusters and NoSQL systems are used primarily as landing pads and staging areas for data. This is before it gets loaded into a data warehouse or analytical database for analysis -- usually in a summarized form that is more conducive to relational structures.

More frequently, however, big data analytics users are adopting the concept of a Hadoop data lake that serves as the primary repository for incoming streams of raw data. In such architectures, data can be analyzed directly in a Hadoop cluster or run through a processing engine like Spark. As in data warehousing, sound data management is a crucial first step in the big data analytics process. Data being stored in the HDFS must be organized, configured and partitioned properly to get good performance out of both extract, transform and load (ETL) integration jobs and analytical queries. 

Once the data is ready, it can be analyzed with the software commonly used for advanced analytics processes. That includes tools for:

  • data mining, which sift through data sets in search of patterns and relationships; 
  • predictive analytics, which build models to forecast customer behavior and other future developments; 
  • machine learning, which taps algorithms to analyze large data sets; and 
  • deep learning, a more advanced offshoot of machine learning.
Posted on by