IS

IS

Tuesday 10 June 2014

Case Study Big Data, Big Rewards

          


            

INTRODUCTION

Big Data is a term applied to data sets whose size is beyond the capability of commonly used software tools to capture, manage, and process. The sheer size of the data, combined with complexity of analysis and commercial imperative to create value from it, has led to a new class of technologies and tools to tackle it. The term Big Data tends to be used in multiple ways, often referring to both the type of data being managed as well as the technology used to store and process it. 

In the most part these technologies originated from companies such as Google, Amazon, Facebook and Linked-In, where they were developed for each company’s own use in order to analyse the massive amounts of social media data they were dealing with. Due to the nature of these companies, the emphasis was on low cost scale-out commodity hardware and open source software. 


The world of Big Data is increasingly being defined by the 4 Vs. i.e. these ‘Vs’ become a reasonable test as to whether a Big Data approach is the right one to adopt for a new area of analysis. The Vs are:

Volume.
The size of the data. With technology it’s often very limiting to talk about data volume in any absolute sense. As technology marches forward, numbers get quickly outdated so it’s better to think about volume in a relative sense instead. If the volume of data you’re looking at is an order of magnitude or larger than anything previously encountered in your industry, then you’re probably dealing with Big Data. For some companies this might be 10’s of terabytes, for others it may be 10’s of petabytes.

Velocity.
The rate at which data is being received and has to be acted upon is becoming much more real-time. While it is unlikely that any real analysis will need to be completed in the same time period, delays in execution will inevitably limit the effectiveness of campaigns, limit interventions or lead to sub-optimal processes. For example, some kind of discount offer to a customer based on their location is less likely to be successful if they have already walked some distance past the store.

Variety.
There are two aspects of variety to consider: syntax and semantics. In the past these have determined the extent to which data could be reliably structured into a relational database and content exposed for analysis. While modern ETL tools are very capable of dealing with data arriving in virtually any syntax they are less able to deal with semantically rich data such as free text. Because of this most organizations have restricted the data coverage of IM systems to a narrow range of data. It follows then that by being more inclusive, additional value may be created by an organization and this is perhaps one of the major appeals of the Big Data approach. Information Management and Big Data, A Reference Architecture.

Value.
We need to consider what commercial value any new sources and forms of data can add to the business. Or, perhaps more appropriately, to what extent can the commercial value of the data be predicted ahead of time so that ROI can be calculated and project budget acquired. ‘Value’ offers a particular challenge to IT in the current harsh economic climate. It is difficult to attract funds without certainty of the ROI and payback period. The tractability of the problem is closely related to this issue as problems that are inherently more difficult to solve will carry greater risk, making project funding more uncertain.
SWOT ANALYSIS

STRENGTH
  • Helps research oriented topics for analytics and inquiry across domains of science, medical, history etc.
  •  Academic excellence, opening new area of statistical research and BI
  •   Great support from industry all over the world.
  • Microsoft join hands with open source community, launching Hadoop on Azure
  • Open source community will continue to prevail with Apache Mahout on Hadoop.
  • Buzzword created by tech firms
  • Moore’s Law: 2 years ago huge investment cost required for 1TB data storage, now this is easily achieved through cloud computing.
WEAKNESS
  •       Lack of technology to support all formats, current implementation has complex logic 
  •        Lots of unstructured data present in platforms like - social media 
  •       Human conversation are messy, hard to process and currently unpredictable 
  •        Requires excessive human interpretation to process 
  •        Continuous monitoring required.
    OPPORTUNITY
    •  People look adaptive to this paradigm shift 
    •  Customer looking towards Big Data service as a probable opportunity in future (Morgan Stanley shows their interest
    • Huge opportunity for processing rich data such as audio, video and images.   
    • Opportunity for online retailers, storage companies, networking companies, software product companies, health industries and service companies.
    THREATS
    • Always a cyber threat
    •   Incorrect prediction due to garbage data as in case of social analytics, cannot predict human mindset.
    • Private confidential data analytics may prove hazardous, data need to be prioritized.

    QUESTION 1 : Describe the kinds of big data collected by the organizations described in this case.

    There are mainly three kinds of big data collected by the organizations described in this case.
    1.      British Library
    • IBM Bigsheets help the British Library to handle with huge quantities of data and extract the useful knowledge.
    •  British Library responsible for preserving British Web sites that no longer exist but need to be preserved for historical purpose.
    • Example, Web sites for past politicians.
    • IBM BigSheets helps the British Library to process large amounts of data quickly and efficiently.

    2.      New York City Police Department (NYPD)
    • City Crime and Criminal Data
    •  State and federal law enforcement agencies are analyzing big data to discover hidden patterns in criminal activity. The Real Time Crime Center data warehouse contains millions of data points on city crime and criminals
    • IBM and New York City Police Department (NYPD) work together to create the warehouse, which contains data on over 120 million criminal complaints, 31 million criminal crime records and 33 billion public records.

    3.      Vestas
    • Turbine Location and wind data for organizations to go green.
    •  Vesta’s wind library currently stores data on perspective turbine location and global weather system.
    • Vestas implemented a solution consisting of IBM InfoSphere BigInsights software running on a high-performance IBM System x iDataPlex server.
    4.      Hertz
    • Data of consumer sentiment
    • A car rental Hetrz using big data solution to analyze consumer sentiment from Web surveys, emails, text message, Web site traffic patterns and data generated at all of Hertz’s 8300 locations in 146 countries.
    • Hertz was able to reducing time spent processing data and improving company response time to customer feedback and changes in sentiment.

    QUESTION 2 : List and describe the business intelligence technologies described in this case.

    1.      IBM BigSheets
    • IBM BigSheets is a cloud application used to perform ad hoc analytical at web scale on unstructured and structured content.
    • IBM Bigsheets is an insight engine that helps extract, annotate, and visually analyze vast amounts of unstructured Web data, delivering the results via a Web browser. For example, users can see search results in a pie chart.
    • State and federal law enforcement agencies are analyzing big data to discover hidden patterns in criminal activity such as correlations between time, opportunity, and organizations, or non-obvious relationships between individuals and criminal organizations that would be difficult to uncover in smaller data sets.
    •  IBM BigSheets built atop the Hadoop framework, so it can process large amounts of data quickly and efficiency.
    2.      Real Time Crime Center (RTTC)

    •  The Real Time Crime Center (RTCC) is a centralized technology center for the New York (NYPD) and Houston Police Departments.
    •  RTCC data warehouse contains millions of data points on city crime and criminals and billion of public records. 
    •  The systems search capabilities allow the NYPD to quickly obtain data from any of these data sources.
    •   Information on criminals. Such as suspect’s photo with details of past offences or addresses with maps, can be visualized in seconds on a video wall or install relayed to officers at a crime scene.

    3.      IBM InfoSphere BigInsights
    • IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. Apache Hadoop is the open source software framework, used to reliably managing large volumes of structured and unstructured data.
    •  Vestas increased the size of its wind library and is able manage and analyze location and weather data with models that are much more powerful and precise.
    • It implemented a solution consisting of IBM InfoSphere BigInsights software running on a high-performance IBM System x iDataPlex server.

    QUESTION 3 : Why did the companies described in this case need to maintain and analyze? What business benefits did they obtain?

    1.      The British Library
    The British Library needed to maintain and analyze big data because :
    ·         Traditional data management methods proved inadequate to archive billions of Web pages and legacy analytics tools couldn’t extract useful knowledge from such quantities of data.

    2.      New York Police Department (NYPD)
    NYPD need to maintain and analyze big data because :
    ·         Allow the NYPD quickly respond on the criminals occurred.
    ·         Help NYPD to obtain sources of the suspects, such as suspect’s photo, past offences or addresses with maps, can be visualized in seconds on a video wall.

    3.      Vestas
    Vestas need to maintain and analyze big data because :
    ·         Vestas is the world’s largest wind energy company.
    ·         Location data are important to Vestas so that can accurately place its turbines.
    ·         Areas without enough wind will not generate the necessary power.
    ·         Area with too much wind may damage the turbines.
    ·         Therefore, Vesta relies on location-based data to determine the best spots to install their turbines.
    ·         Vesta’s Wind Library currently stores 2.8 petabytes od data.

    4.      Hertz
    Car rental giant Hertz need to maintain and analyze big data because :
    ·         Reducing time spent processing data.
    ·         Improving company response time to customer feed back.
    ·         Hertz was able to determine that delays were occurring for returns in Philadelphia during specific time of the day.
    ·         Enhanced Hertz’s performance and increased customer satisfaction.

    What business benefits did they obtain?

    The business benefits for maintaining and analyzing big data are as follows :
    1.      Competitive advantages
    2.      Performance Enhancement
    3.      Increase customer satisfaction
    4.      Attract more customer and generate more revenue
    5.      Improved decision making (faster & accurate)
    6.      Excellence operational
    7.      Reduced cost and time spent
    QUESTION 4 : Identify three decisions that were improved by using big data.

    1.      Optimal uses of resources and operational time

    By using the big data, the companies can optimal uses of their resources to enhance performance. Vestas can forecast optimal turbine placement in 15 minutes instead of three weeks, saving a months of development time for turbine site.

    2.      Quick and effective decision making

    Decision making improves and can be quickly and effective by using big data. Visitor of The British Library and NYPD can quickly and effective searches data from the British Library Web sites. NYPD can make a faster decision to gather the suspect’s detail by using The Real Time Crime Center.

    3.      Reduce operational cost and other related cost

    Company quickly make the right decision and hence will eliminate wrong decision. Example, Hertz was able quickly adjust staffing levels at its Philadelphia office during those peak times, ensuring a manager was present to resolve any issues.

    QUESTION 5 : What kinds of organizations are most likely to need big data management and analytical tools? Why?

    1.      Organizations which responsible to store the huge information such as national library, registration department, income tax and so on because these organizations typically be a sources for government and the public.
    2.      Authorities Organization such a police department, custom, immigration because they need to store a big data about criminals and also public to use for safety of the society.
    3.      Organization to go green need the big data about the weather and location because the weather and location data are very useful for the companies to accurately make a decision.
    In this case, Vestas needed the data about location and wind to locate their turbines.

    CONCLUSION

    Big Data it's varied; it's growing; it's moving fast, and  it's very much in need of smart management. Data, cloud and engagement are energizing organizations across multiple industries and present an enormous opportunity to make organizations more agile, more efficient and more competitive. In order to capture that opportunity, organizations require a modern Information Management architecture.

    IBM’s big data platform is helping enterprises across all industries. IBM understands the business challenges and dynamics of your industry and we can help you make the most of all your information. When companies can analyze all of their available data, rather than a subset, they gain a powerful advantage over their competition. IBM has the technology and the expertise to apply big data solutions in a way that addresses your specific business problems and delivers rapid return on investment.

    No comments:

    Post a Comment