Thursday, March 31, 2016


 A wise investment in your big data future

 “Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed”

In this blog posting I will be talking about various industries where data visualizations are used.

1) Healthcare: This is one of the biggest industries that is exploiting the usage of big data. It is essential for healthcare industries to come up with innovative ideas to solve the real world problems. In order for them to always be informed it is essential for them to be able to view all the information that they have in a better manner. For this reason I will be taking an example of Donor - Recipients relationship and the purpose for which matches were found. This is very essential for healthcare industries as they exactly need to understand the best combination pair of a Donor - Recipient for a given purpose.

Healthcare Visualization

Above I have mentioned a graph which is one way to look at it though there can be multiple ways to see this too like with the help of:
  • Bar Graph
  • Line Graph

2) Telecommunication and Broadband Industry: The customers would like to know about various broadband services and the prices that are available in the market. The companies would like to know in which part of the world their services are being offered and how they can differentiate their service in one country from another. In order to compare prices across the world the best way to do so is on a map.

Usage of broadband services across the world

3) E-commerce: This is a booming industry and since everything is going digital. E-commerce is an industry that has really picked up. From people selling products to services in the market everyone now relies on online shopping. It has become more convenient and also reduces the cost of having physical stores and offices.

With the help of data visualization it makes it very easy for employees to track the performance of their product. Since e-commerce has various aspects to it from maintaining a warehouse to managing a well working UI to taking care of delivery. All these fields are related to one another and hence it is essential to be able to view this information in one screen. This is where dashboards come in play where employees and look at exactly all the information that will be helpful in making decisions.

Sales Dashboard for E-commerce

As a conclusion everyone would agree that data visualization is now essential in organizations for them to be able to use the information in an effective manner. With the increase in data it is important to be able to exploit the data and this is exactly where data visualization comes in play.   


Thursday, March 3, 2016

Big Data Types and Data Warehousing 

The only constant is change and the change that is sweeping out generation is to go digital. With the increase in digitization there is a major increase in the data available. In this post I will be talking about two types of data that people are accustomed to using and hence will be taking a closer look at the differences between the two i.e. structured and unstructured data.

Structured vs. Unstructured
1)    Structured Data: In my understanding structured data is organized. It is something that can be displayed in columns and rows which makes it easy to analyze. This type is easy for machines to read and understand. It is very easy to search with basic algorithms. It is very important for businesses and acts as a backbone to provide business insights. Examples are Sensor data - GPS data, manufacturing sensors, medical devices, Point of Sale Data - credit card information, location of sale, product information, Call Detail Records - time of call, caller and recipient information, Web Server Logs - Page requests, other server activity or any data inputted into a computer: age, zip code, gender, etc. Structured Query Language is the most common method used to question data. Operations like insert, delete, update can be performed.

2)    Unstructured Data: This on the other hand is not organized and does not follow any structured data model. It can be displayed in a particular format and hence cannot be very easily analysed. The primary use of this data is to make sense out it or in other words make it structured to help businesses make better decisions. Since this type makes up all 85% of the data available it is very important to make use of it. Their biggest source is social media data. Examples include emails, text documents (Word docs, PDFs, etc.), social media posts, videos, audio files, and images.Object oriented platform NoSQL and Hadoop can be used to handle unstructured data.
Sources of Data

Types of Data:
There are many types of data available to use by an organization. In this post I will be listing down a few types:
  • Spatial Data: It is the data that has several dimensions. This data includes geospatial and structo-spatial. It is data where location is benefit but does not have to be geographical.
  • Integrated Operational Data: This consists of operational data sets and covers a business. It is subject-oriented, integrated and time-current.
  •  Redundant Data: It is duplicate data which is stored in multiple data sites. This has to be taken care of to make sure that information quality is maintained.
  • Integrated Historical Data: Historical data is important to keep. It is composed of many different data types and comes from different sources hence it is very important to integrate and maintain.
  • Foredata: This is the data that is developed from before and consists of data about objects and events and every data that any official interacts with.
  • Legacy Data: It comes from virtually anywhere and support legacy systems. It includes hierarchical, XML, network, object and is also called disparate data.
  •  Demographic Data: This deals with the human population data. It represents identification, location, gender and other factors
Growth of Data over time

Data Warehouse is a traditional method to integrate data. Data is extracted, transformed and loaded into a data warehouse. Though it is extremely difficult to manage data from different sources a Data Warehouse has its benefits and limitations:

Data Warehouse Structure
Benefits of a Data Warehouse:
  • Since it provides better access to information, better decisions can be made based on it  
  • Tighter control of the data and better security
  • Timely access to information
  •  Provides the ability to quickly analyze data
  • High query success
Limitations of Data Warehouse:
  • Data comes in various forms and is stored on different systems. It is difficult to integrate the data and the same time is time consuming. It requires intensive manual processing
  • Unstructured data cannot be stored and also there are no methods to store real time data
  • No central place available to view the data  
  • No automated way to build reports based on the data
  • Data is static and dated
  • Limited flexibility for different types of users as it requires separate data marts for different types of users
  • Additional time and high costs are associated with adding new data sets  
  • Security is a major issue as data owners lose control over their data
  • High initial implementation costs 

Future of Data Warehousing:
Data Warehousing is a very common technique used by organizations to get insights from data. It helps them integrate data from multiple resources and allows processing of millions of data rows. With the advantages that it has there are several disadvantages as well. In the future these disadvantages should be mitigated.
In this era where everything is cloud based the future of data warehousing should be integrating with the cloud. Cloud based warehousing will allow data analytics to be provided through various private or public cloud. This will not only allow them to have bigger data storage space but at the same time organizations can customize their data storage needs. This cloud based model will also allow organizations to have access to analytics from anywhere and everywhere.
Cloud based Data Warehouse
Also data warehousing should be able to incorporate real time unstructured data in order to make sense out of it and help organizations make better decisions. In this age where everything is going digital it is critical for companies to capture this digital data and hence data warehousing should support Big Data- Social Media Analytics.
