Big Data Types and Data Warehousing
The only constant is
change and the change that is sweeping out generation is to go digital. With
the increase in digitization there is a major increase in the data available. In
this post I will be talking about two types of data that people are accustomed
to using and hence will be taking a closer look at the differences between the
two i.e. structured and unstructured data.
![]() |
Structured vs. Unstructured |
1) Structured Data: In
my understanding structured data is organized. It is something that can be
displayed in columns and rows which makes it easy to analyze. This type is easy
for machines to read and understand. It is very easy to search with basic
algorithms. It is very important for businesses and acts as a backbone to
provide business insights. Examples are Sensor
data - GPS data, manufacturing sensors, medical devices, Point of Sale Data -
credit card information, location of sale, product information, Call Detail Records
- time of call, caller and recipient information, Web Server Logs - Page
requests, other server activity or any data inputted into a computer: age, zip
code, gender, etc. Structured Query Language is the most common method used to
question data. Operations like insert, delete, update can be performed.
2) Unstructured Data: This on the other hand is not organized and does not follow
any structured data model. It can be displayed in a particular format and hence
cannot be very easily analysed. The primary use of this data is to make sense
out it or in other words make it structured to help businesses make better
decisions. Since this type makes up all 85% of the data available it is very
important to make use of it. Their biggest source is social media data.
Examples include emails, text documents (Word docs, PDFs, etc.), social media
posts, videos, audio files, and images.Object oriented platform NoSQL and Hadoop can be used to
handle unstructured data.
Types of Data:
There are many types
of data available to use by an organization. In this post I will be listing
down a few types:
- Spatial Data: It is the data that has several dimensions. This data includes geospatial and structo-spatial. It is data where location is benefit but does not have to be geographical.
- Integrated Operational Data: This consists of operational data sets and covers a business. It is subject-oriented, integrated and time-current.
- Redundant Data: It is duplicate data which is stored in multiple data sites. This has to be taken care of to make sure that information quality is maintained.
- Integrated Historical Data: Historical data is important to keep. It is composed of many different data types and comes from different sources hence it is very important to integrate and maintain.
- Foredata: This is the data that is developed from before and consists of data about objects and events and every data that any official interacts with.
- Legacy Data: It comes from virtually anywhere and support legacy systems. It includes hierarchical, XML, network, object and is also called disparate data.
- Demographic Data: This deals with the human population data. It represents identification, location, gender and other factors
Data Warehouse is a traditional
method to integrate data. Data is extracted, transformed and loaded into a data
warehouse. Though it is extremely difficult to manage data from different
sources a Data Warehouse has its benefits and limitations:
Data Warehouse Structure |
Benefits of a Data Warehouse:
- Since it provides better access to information, better decisions can be made based on it
- Tighter control of the data and better security
- Timely access to information
- Provides the ability to quickly analyze data
- High query success
Limitations of Data Warehouse:
- Data comes in various forms and is stored on different systems. It is difficult to integrate the data and the same time is time consuming. It requires intensive manual processing
- Unstructured data cannot be stored and also there are no methods to store real time data
- No central place available to view the data
- No automated way to build reports based on the data
- Data is static and dated
- Limited flexibility for different types of users as it requires separate data marts for different types of users
- Additional time and high costs are associated with adding new data sets
- Security is a major issue as data owners lose control over their data
- High initial implementation costs
Future of Data Warehousing:
Data Warehousing is a
very common technique used by organizations to get insights from data. It helps
them integrate data from multiple resources and allows processing of millions
of data rows. With the advantages that it has there are several disadvantages
as well. In the future these disadvantages should be mitigated.
In this era where everything
is cloud based the future of data warehousing should be integrating with the
cloud. Cloud based warehousing will allow data analytics to be provided through
various private or public cloud. This will not only allow them to have bigger
data storage space but at the same time organizations can customize their data
storage needs. This cloud based model will also allow organizations to have
access to analytics from anywhere and everywhere.
![]() |
Cloud based Data Warehouse |
Also data warehousing
should be able to incorporate real time unstructured data in order to make
sense out of it and help organizations make better decisions. In this age where
everything is going digital it is critical for companies to capture this
digital data and hence data warehousing should support Big Data- Social Media
Analytics.
References:
References:
http://www.tomsitpro.com/articles/data_warehouse-business_intelligence-data_reporting-big_data,2-255-3.html
http://www.learn.geekinterview.com
http://www.learn.geekinterview.com
No comments:
Post a Comment