Sunday, October 30, 2016



One of the best experiences of my life GHC 2016!!!






         Day 0                                                                           Day 4 



One of the best experiences of my life Grace Hopper Conference for Women in Technology 2016. Can't wait for October 2017 and to go back to Florida for the conference. 


Some of the highlights of my trip:


Got me the badge with ribbons and registered as a Hopper. Went for a networking event organized by PROS made a new friend there and saw ten little crabs race one another. Day 1 Started a little early for me, I was volunteering as a Hopper for the first keynote. Dr. Latanya Sweeny Ginni Rometty, CEO IBM and Megan Smith, CTO-USA were a pleasure to listen. 


Things I learned from:



Latanya:

1) We live in technocracy; every value is up for grabs

2) We can harness technology for public interest, and we can save the world

Ginni:


1) You never let someone decide who you are; you define who you are

2) Growth and comfort never coexist. It's true for people, countries, and companies
3) Work on something bigger than yourself 
4) Data is the new natural resource 

Megan:


1) If you want to go quickly, go alone. If you want to go far, go together.



Gave a couple of interviews, interacted with executives from various companies and enjoyed every bit of it. Learning about what they do, getting career advice and swag! Build some incredible relationships and took a mentor back home with me. 


Empowered Data is a superpower that I gladly embrace!


#ghc16 #ellermis #uofa #beardown 


Thank you GHC and Eller MIS, Univeristy for Arizona for giving me this opportunity!













Thursday, March 31, 2016




 


 A wise investment in your big data future






 “Data visualization is the presentation of data in a pictorial or graphical format. It enables decision makers to see analytics presented visually, so they can grasp difficult concepts or identify new patterns. With interactive visualization, you can take the concept a step further by using technology to drill down into charts and graphs for more detail, interactively changing what data you see and how it’s processed”

In this blog posting I will be talking about various industries where data visualizations are used.

1) Healthcare: This is one of the biggest industries that is exploiting the usage of big data. It is essential for healthcare industries to come up with innovative ideas to solve the real world problems. In order for them to always be informed it is essential for them to be able to view all the information that they have in a better manner. For this reason I will be taking an example of Donor - Recipients relationship and the purpose for which matches were found. This is very essential for healthcare industries as they exactly need to understand the best combination pair of a Donor - Recipient for a given purpose.

Healthcare Visualization


Above I have mentioned a graph which is one way to look at it though there can be multiple ways to see this too like with the help of:
  • Bar Graph
  • Line Graph



2) Telecommunication and Broadband Industry: The customers would like to know about various broadband services and the prices that are available in the market. The companies would like to know in which part of the world their services are being offered and how they can differentiate their service in one country from another. In order to compare prices across the world the best way to do so is on a map.



Usage of broadband services across the world


3) E-commerce: This is a booming industry and since everything is going digital. E-commerce is an industry that has really picked up. From people selling products to services in the market everyone now relies on online shopping. It has become more convenient and also reduces the cost of having physical stores and offices.

With the help of data visualization it makes it very easy for employees to track the performance of their product. Since e-commerce has various aspects to it from maintaining a warehouse to managing a well working UI to taking care of delivery. All these fields are related to one another and hence it is essential to be able to view this information in one screen. This is where dashboards come in play where employees and look at exactly all the information that will be helpful in making decisions.

Sales Dashboard for E-commerce



As a conclusion everyone would agree that data visualization is now essential in organizations for them to be able to use the information in an effective manner. With the increase in data it is important to be able to exploit the data and this is exactly where data visualization comes in play.   


References:




Thursday, March 3, 2016

Big Data Types and Data Warehousing 

The only constant is change and the change that is sweeping out generation is to go digital. With the increase in digitization there is a major increase in the data available. In this post I will be talking about two types of data that people are accustomed to using and hence will be taking a closer look at the differences between the two i.e. structured and unstructured data.

Structured vs. Unstructured
1)    Structured Data: In my understanding structured data is organized. It is something that can be displayed in columns and rows which makes it easy to analyze. This type is easy for machines to read and understand. It is very easy to search with basic algorithms. It is very important for businesses and acts as a backbone to provide business insights. Examples are Sensor data - GPS data, manufacturing sensors, medical devices, Point of Sale Data - credit card information, location of sale, product information, Call Detail Records - time of call, caller and recipient information, Web Server Logs - Page requests, other server activity or any data inputted into a computer: age, zip code, gender, etc. Structured Query Language is the most common method used to question data. Operations like insert, delete, update can be performed.

2)    Unstructured Data: This on the other hand is not organized and does not follow any structured data model. It can be displayed in a particular format and hence cannot be very easily analysed. The primary use of this data is to make sense out it or in other words make it structured to help businesses make better decisions. Since this type makes up all 85% of the data available it is very important to make use of it. Their biggest source is social media data. Examples include emails, text documents (Word docs, PDFs, etc.), social media posts, videos, audio files, and images.Object oriented platform NoSQL and Hadoop can be used to handle unstructured data.
Sources of Data

Types of Data:
There are many types of data available to use by an organization. In this post I will be listing down a few types:
  • Spatial Data: It is the data that has several dimensions. This data includes geospatial and structo-spatial. It is data where location is benefit but does not have to be geographical.
  • Integrated Operational Data: This consists of operational data sets and covers a business. It is subject-oriented, integrated and time-current.
  •  Redundant Data: It is duplicate data which is stored in multiple data sites. This has to be taken care of to make sure that information quality is maintained.
  • Integrated Historical Data: Historical data is important to keep. It is composed of many different data types and comes from different sources hence it is very important to integrate and maintain.
  • Foredata: This is the data that is developed from before and consists of data about objects and events and every data that any official interacts with.
  • Legacy Data: It comes from virtually anywhere and support legacy systems. It includes hierarchical, XML, network, object and is also called disparate data.
  •  Demographic Data: This deals with the human population data. It represents identification, location, gender and other factors
Growth of Data over time

Data Warehouse is a traditional method to integrate data. Data is extracted, transformed and loaded into a data warehouse. Though it is extremely difficult to manage data from different sources a Data Warehouse has its benefits and limitations:


Data Warehouse Structure
Benefits of a Data Warehouse:
  • Since it provides better access to information, better decisions can be made based on it  
  • Tighter control of the data and better security
  • Timely access to information
  •  Provides the ability to quickly analyze data
  • High query success
Limitations of Data Warehouse:
  • Data comes in various forms and is stored on different systems. It is difficult to integrate the data and the same time is time consuming. It requires intensive manual processing
  • Unstructured data cannot be stored and also there are no methods to store real time data
  • No central place available to view the data  
  • No automated way to build reports based on the data
  • Data is static and dated
  • Limited flexibility for different types of users as it requires separate data marts for different types of users
  • Additional time and high costs are associated with adding new data sets  
  • Security is a major issue as data owners lose control over their data
  • High initial implementation costs 

Future of Data Warehousing:
Data Warehousing is a very common technique used by organizations to get insights from data. It helps them integrate data from multiple resources and allows processing of millions of data rows. With the advantages that it has there are several disadvantages as well. In the future these disadvantages should be mitigated.
In this era where everything is cloud based the future of data warehousing should be integrating with the cloud. Cloud based warehousing will allow data analytics to be provided through various private or public cloud. This will not only allow them to have bigger data storage space but at the same time organizations can customize their data storage needs. This cloud based model will also allow organizations to have access to analytics from anywhere and everywhere.
Cloud based Data Warehouse
Also data warehousing should be able to incorporate real time unstructured data in order to make sense out of it and help organizations make better decisions. In this age where everything is going digital it is critical for companies to capture this digital data and hence data warehousing should support Big Data- Social Media Analytics.

References:





Thursday, February 18, 2016

Dimension Modelling for McDonald's



McDonalds, the largest hamburger fast food joint in the world that aims to deliver hamburgers, chicken, French fries, breakfast meals, soft drinks, and deserts at a valued price.

Currently, McDonalds operates as a franchise, affiliation, or a food service restaurant. Restaurants operate as a company, independent entrepreneurs, or by an affiliate. McDonalds has over 36,000 locations serving approximately 69 million customers in over 100 countries per day.

McDonald’s key strategies are to strengthen its alliance with its company, franchisees, and suppliers. By doing so, McDonalds will continue to create new innovations to bring customer satisfaction. Overall, McDonalds aims to continue its modern burger brand by continuing to deliver high-quality food at a world-class experience. Hence, people around the world will continue to say, “I’m Lovin’ It”.

When it comes to tracking performance the main things that the CEO f McDonald's would want to look at is which store sells how much of each food item and the revenue that is generated from these sales. This will help them determine the demand in each area and help them in making decisions like where they should open their next store. Also in terms of Food items as we all know McDonald's customizes their menu according to states and countries they have specials that are served only in that particular region. It is important for them to see if these specials are good enough to generate revenue or not. 

Hence to achieve answers to these questions it is a good business decision for them to invest in creating a dimensional model to systematically store data. Now let’s discuss a step by step approach to creating a Dimension Model for McDonald’s

Step 1: Identifying the Business Process

Each company has various business processes but in this scenario we are trying to determine the quantity of each food item sold on a particular day and the revenue that was made from each food item.

Step 2: Identifying Grain

What is grain?
The level at which one wants to see the data at. Hence if one ones to see a company’s sales figures on a daily basis then the granularity is daily but in the case where monthly stats are required then the granularity is monthly.

In this case the grain is:
For a particular store located in a particular city, the number of food items sold on a particular date and the amount generated by selling each of these items.

Step 3: Identifying Dimensions

Dimensions are objects or things, things that are being spoken about. We will start by creating a separate table for each dimension. In this scenario the Dimensions that I have identified are Menus, Food_Items, Employee, Employee Position, Location and the most important Date.

Step 4: Identifying Measures

Measures are values that are estimated in a process. They are quantifiable and mostly numeric. They are stored in a fact table. In this scenario the Cost of the total meal and Quantity are the measures that we will be recording in the Fact Table. In this scenario we will be using a Transaction Fact Table. 

"A row in a 
transaction fact table 
corresponds to a measurement event at a point in space and time.Transaction fact tables may be dense or sparse because rows exist only if measurements take place.The measured numeric facts must be consistent with the transaction grain" - as explained by Kimball and Ross

This Diagram was created using Visio

I would also like to to add some Fun Facts about McDonald's

Taken from Business Insider

References:




Thursday, February 4, 2016




Business Intelligence & 
Analysis Products Scan & Evaluation 


Before analyzing the tools I would like to give a simple explanation of as to what Business Intelligence is. It is one of those buzz words that everyone uses but not everyone clearly understands it.

The best definition that I found on the internet is:

Computer-based techniques used in spotting, digging-out, and analyzing 'hard' business data, such as sales revenue by products or departments or associated costs and incomesObjectives of a BI exercise include (1) understanding of a firm's internal and external strengths and weaknesses, (2) understanding of the relationship between different data for better decision making, (3) detection of opportunities for innovation, and (4) cost reduction and optimal deployment of resources.


1. The steps to achieve Business Intelligence



The best way to learn a new word would be to use it in a sentence hence, use this term in a sentence:

The man used his business intelligence to build up his business's profits and stocks while also bringing down other business competition. 
You could try too!

So getting back to tools, I am going to analyze 5 Tools based on the following criteria’s. Before going into the comparison let me give a brief description of each of these criteria’s:

  1. Usability & visualization: This criteria necessitates the ease of use of the tool. Mainly describes how a user can navigate through the tool to get his/her work done. The questions that need to be answered to check whether the tool is easy to use or not are is it easy to learn? Does it support Mobile intelligence? Types of graphs and visualization that can be used? For a tool to be good enough a good score for this criteria is a necessity.
  2. Scalability: It is extremely important in terms of identifying how much data the tool can handle. In this world of Big Data this is a very essential factor in picking the right tool.
  3. Data Integration: This is essential for a BI tool with to be able to integrate data from different sources. There is a whole lot of unstructured data which needs to be structured and then be used to make better decisions. Hence this criteria needs to have a good score.
  4. Cost Effectiveness: This is one of the major drivers behind the decision of which product could be used. Depending on the type of business and the budget the organization has set in spending on these tools. The cost score will be based on the basic and enterprise version of the tool.
  5. Customer Support: The score of this criteria is important to identify the level with which the product provides customer service. This could be in the form of answering customer questions, documentation of the product and the methods to use it, quality support and the rate at which the company provides services. In this era of internet it also includes blogs and online support.
  6. Infrastructure & Architecture: This category measures how well the tool supports IT infrastructure. Mainly which operating system and which server platforms are supported. It also takes care of important factors like re-usability, caching, zero-footprint, load balancing, fail-over and In-Memory techniques.
  7. Predictive Analysis & Data Mining: This criteria examines if and how data and text mining is supported and to what degree. Data mining is widely used to be able to predict behavior of customers, vendors, web visitors whereas text mining is mainly being used to mine Twitter messages.
  8. Search & Alerting: This functionality is necessary in order to make searches on large data sets and metadata along with the ability of a tool to alert in case of anomalies or different trends.

 The tools recommended should provide following features for it to be acceptable:
    Reporting (KPIs, metrics)
    Ad hoc querying
    OLAP (cubes, slice & dice, drilling)
    Dashboards/scorecards
    Operational/real-time BI
    Automated monitoring/alerting
   
    These are some well- known tools used in the industry today:
2. Tools available in the market 



    Out of these I will be further analyzing the following tools:


3. Weighted Matrix for the tools




4. Graphical Representation of the Matrix



It is known for its intuitiveness and it makes it very easy regardless of technological know-how of a person. The platform is compared to Excel in terms of its ease of use and it very but is very feature rich. It creates sharable dashboards, interactive reporting and has flexible features and scalability. This is the tool that is used by most of the industries as it is the smartest tool of all. Exactly why it’s ranked first.

           
This product is very useful for companies with major BI needs. The enterprise edition integrates Oracle’s other tools and products. Hence this product can be used for any visualization, reporting or information manipulation. This is ranked second as it understands BI needs of an organization at a higher level.
   
     Fun Fact: The University of Arizona uses this tool

         
This product falls under IBM’s contribution to BI. It is a web based solution to company analytics.  It can be used for any business. It provides the capabilities of creating Dashboards, reports, detailed analysis and scorecarding to allow automation of business metrics to help companies get a better insight into their data.

           
This product is known for easily converting unstructured big data to structured data. This tool is very easy to use and can be used by anyone with beginner level database skills in an organization to make instant visualizations and reports. It also provides location based analysis which is great asset to e-commerce websites.


This is also known for its user friendliness that connects the gap between techy BI tools and traditional productivity apps, It has a clean interface and is capable of most of the features a BI tool should have with visually appealing dashboards in easy to understand format.






    References: