INTRODUCTION
Big Data is a term
applied to data sets whose size is beyond the capability of commonly used
software tools to capture, manage, and process. The sheer size of the data,
combined with complexity of analysis and commercial imperative to create value
from it, has led to a new class of technologies and tools to tackle it. The
term Big Data tends to be used in multiple ways, often referring to both the
type of data being managed as well as the technology used to store and process
it.
In the most part
these technologies originated from companies such as Google, Amazon, Facebook
and Linked-In, where they were developed for each company’s own use in order to
analyse the massive amounts of social media data they were dealing with. Due to
the nature of these companies, the emphasis was on low cost scale-out commodity
hardware and open source software.
The world of Big Data is increasingly being defined by the 4 Vs. i.e. these ‘Vs’ become a reasonable test as to whether a Big Data approach is the right one to adopt for a new area of analysis. The Vs are:
The world of Big Data is increasingly being defined by the 4 Vs. i.e. these ‘Vs’ become a reasonable test as to whether a Big Data approach is the right one to adopt for a new area of analysis. The Vs are:
Volume.
The size of the data. With technology it’s
often very limiting to talk about data volume in any absolute sense. As
technology marches forward, numbers get quickly outdated so it’s better to think
about volume in a relative sense instead. If the volume of data you’re looking
at is an order of magnitude or larger than anything previously encountered in
your industry, then you’re probably dealing with Big Data. For some companies
this might be 10’s of terabytes, for others it may be 10’s of petabytes.
Velocity.
The rate at which data is being received and
has to be acted upon is becoming much more real-time. While it is unlikely that
any real analysis will need to be completed in the same time period, delays in
execution will inevitably limit the effectiveness of campaigns, limit
interventions or lead to sub-optimal processes. For example, some kind of
discount offer to a customer based on their location is less likely to be
successful if they have already walked some distance past the store.
Variety.
There are two aspects of variety to consider:
syntax and semantics. In the past these have determined the extent to which
data could be reliably structured into a relational database and content
exposed for analysis. While modern ETL tools are very capable of dealing with
data arriving in virtually any syntax they are less able to deal with
semantically rich data such as free text. Because of this most organizations
have restricted the data coverage of IM systems to a narrow range of data. It
follows then that by being more inclusive, additional value may be created by
an organization and this is perhaps one of the major appeals of the Big Data
approach. Information Management and Big Data, A Reference Architecture.
Value.
We need to consider what commercial value any
new sources and forms of data can add to the business. Or, perhaps
more appropriately, to what extent can the commercial value of the data be
predicted ahead of time so that
ROI can be calculated and project budget acquired. ‘Value’
offers a particular challenge to IT in the current
harsh economic climate. It is difficult to attract funds without certainty of the ROI and payback
period. The tractability of the problem is closely
related to this
issue as problems that are inherently more difficult to solve will carry
greater risk, making project
funding more uncertain.
SWOT ANALYSIS
STRENGTH
- Helps research oriented topics for analytics and inquiry across domains of science, medical, history etc.
- Academic excellence, opening new area of statistical research and BI
- Great support from industry all over the world.
- Microsoft join hands with open source community, launching Hadoop on Azure
- Open source community will continue to prevail with Apache Mahout on Hadoop.
- Buzzword created by tech firms
- Moore’s Law: 2 years ago huge investment cost required for 1TB data storage, now this is easily achieved through cloud computing.
WEAKNESS
- Lack of technology to support all formats, current implementation has complex logic
- Lots of unstructured data present in platforms like - social media
- Human conversation are messy, hard to process and currently unpredictable
- Requires excessive human interpretation to process
- Continuous monitoring required.
OPPORTUNITY
- People look adaptive to this paradigm shift
- Customer looking towards Big Data service as a probable opportunity in future (Morgan Stanley shows their interest
- Huge opportunity for processing rich data such as audio, video and images.
- Opportunity for online retailers, storage companies, networking companies, software product companies, health industries and service companies.
THREATS
- Always a cyber threat
- Incorrect prediction due to garbage data as in case of social analytics, cannot predict human mindset.
- Private confidential data analytics may prove hazardous, data need to be prioritized.
QUESTION
1 : Describe the kinds of
big data collected by the organizations described in this case.
There are mainly
three kinds of big data collected by the organizations described in this case.
1.
British Library
- IBM Bigsheets help the British Library to handle with huge quantities of data and extract the useful knowledge.
- British Library responsible for preserving British Web sites that no longer exist but need to be preserved for historical purpose.
- Example, Web sites for past politicians.
- IBM BigSheets helps the British Library to process large amounts of data quickly and efficiently.
2.
New York City Police Department (NYPD)
- City Crime and Criminal Data
- State and federal law enforcement agencies are analyzing big data to discover hidden patterns in criminal activity. The Real Time Crime Center data warehouse contains millions of data points on city crime and criminals
- IBM and New York City Police Department (NYPD) work together to create the warehouse, which contains data on over 120 million criminal complaints, 31 million criminal crime records and 33 billion public records.
3.
Vestas
- Turbine Location and wind data for organizations to go green.
- Vesta’s wind library currently stores data on perspective turbine location and global weather system.
- Vestas implemented a solution consisting of IBM InfoSphere BigInsights software running on a high-performance IBM System x iDataPlex server.
4.
Hertz
- Data of consumer sentiment
- A car rental Hetrz using big data solution to analyze consumer sentiment from Web surveys, emails, text message, Web site traffic patterns and data generated at all of Hertz’s 8300 locations in 146 countries.
- Hertz was able to reducing time spent processing data and improving company response time to customer feedback and changes in sentiment.
QUESTION
2 : List and describe the
business intelligence technologies described in this case.
1.
IBM BigSheets
- IBM BigSheets is a cloud application used to perform ad hoc analytical at web scale on unstructured and structured content.
- IBM Bigsheets is an insight engine that helps extract, annotate, and visually analyze vast amounts of unstructured Web data, delivering the results via a Web browser. For example, users can see search results in a pie chart.
- State and federal law enforcement agencies are analyzing big data to discover hidden patterns in criminal activity such as correlations between time, opportunity, and organizations, or non-obvious relationships between individuals and criminal organizations that would be difficult to uncover in smaller data sets.
- IBM BigSheets built atop the Hadoop framework, so it can process large amounts of data quickly and efficiency.
2.
Real Time Crime Center (RTTC)
- The Real Time Crime Center (RTCC) is a centralized technology center for the New York (NYPD) and Houston Police Departments.
- RTCC data warehouse contains millions of data points on city crime and criminals and billion of public records.
- The systems search capabilities allow the NYPD to quickly obtain data from any of these data sources.
- Information on criminals. Such as suspect’s photo with details of past offences or addresses with maps, can be visualized in seconds on a video wall or install relayed to officers at a crime scene.
3.
IBM InfoSphere BigInsights
- IBM InfoSphere BigInsights brings the power of Hadoop to the enterprise. Apache Hadoop is the open source software framework, used to reliably managing large volumes of structured and unstructured data.
- Vestas increased the size of its wind library and is able manage and analyze location and weather data with models that are much more powerful and precise.
- It implemented a solution consisting of IBM InfoSphere BigInsights software running on a high-performance IBM System x iDataPlex server.
QUESTION
3 : Why did the companies
described in this case need to maintain and analyze? What business benefits did
they obtain?
1. The British Library
The
British Library needed to maintain and analyze big data because :
·
Traditional
data management methods proved inadequate to archive billions of Web pages and
legacy analytics tools couldn’t extract useful knowledge from such quantities
of data.
2. New York Police
Department (NYPD)
NYPD
need to maintain and analyze big data because :
·
Allow
the NYPD quickly respond on the criminals occurred.
·
Help
NYPD to obtain sources of the suspects, such as suspect’s photo, past offences
or addresses with maps, can be visualized in seconds on a video wall.
3. Vestas
Vestas
need to maintain and analyze big data because :
·
Vestas
is the world’s largest wind energy company.
·
Location
data are important to Vestas so that can accurately place its turbines.
·
Areas
without enough wind will not generate the necessary power.
·
Area
with too much wind may damage the turbines.
·
Therefore,
Vesta relies on location-based data to determine the best spots to install
their turbines.
·
Vesta’s
Wind Library currently stores 2.8 petabytes od data.
4. Hertz
Car
rental giant Hertz need to maintain and analyze big data because :
·
Reducing
time spent processing data.
·
Improving
company response time to customer feed back.
·
Hertz
was able to determine that delays were occurring for returns in Philadelphia
during specific time of the day.
·
Enhanced
Hertz’s performance and increased customer satisfaction.
What business
benefits did they obtain?
The business benefits for maintaining and
analyzing big data are as follows :
1.
Competitive
advantages
2.
Performance
Enhancement
3.
Increase
customer satisfaction
4.
Attract
more customer and generate more revenue
5.
Improved
decision making (faster & accurate)
6.
Excellence
operational
7.
Reduced
cost and time spent
QUESTION
4 : Identify three decisions
that were improved by using big data.
1.
Optimal uses of resources and operational time
By
using the big data, the companies can optimal uses of their resources to
enhance performance. Vestas can forecast optimal turbine placement in 15
minutes instead of three weeks, saving a months of development time for turbine
site.
2.
Quick and effective decision making
Decision
making improves and can be quickly and effective by using big data. Visitor of
The British Library and NYPD can quickly and effective searches data from the
British Library Web sites. NYPD can make a faster decision to gather the suspect’s
detail by using The Real Time Crime Center.
3.
Reduce operational cost and other related cost
Company
quickly make the right decision and hence will eliminate wrong decision.
Example, Hertz was able quickly adjust staffing levels at its Philadelphia
office during those peak times, ensuring a manager was present to resolve any
issues.
QUESTION
5 : What kinds of
organizations are most likely to need big data management and analytical tools?
Why?
1.
Organizations
which responsible to store the huge information such as national library,
registration department, income tax and so on because these organizations
typically be a sources for government and the public.
2.
Authorities
Organization such a police department, custom, immigration because they need to
store a big data about criminals and also public to use for safety of the
society.
3.
Organization
to go green need the big data about the weather and location because the
weather and location data are very useful for the companies to accurately make
a decision.
In
this case, Vestas needed the data about location and wind to locate their
turbines.
CONCLUSION
Big Data it's
varied; it's growing; it's moving fast, and it's very much in need of smart management.
Data, cloud and engagement are energizing organizations across multiple
industries and present an enormous opportunity to make organizations more
agile, more efficient and more competitive. In order to capture that opportunity,
organizations require a modern Information Management architecture.
IBM’s big data
platform is helping enterprises across all industries. IBM understands the
business challenges and dynamics of your industry and we can help you make the
most of all your information. When companies can analyze all of their available
data, rather than a subset, they gain a powerful advantage over their
competition. IBM has the technology and the expertise to apply big data
solutions in a way that addresses your specific business problems and delivers
rapid return on investment.
No comments:
Post a Comment