What is big data?
There are many definitions of the term ‘big data’ but most suggest something like the following:
'Extremely large collections of data (data sets) that may be analysed to reveal patterns, trends,
and associations, especially relating to human behaviour and interactions.'
In addition, many definitions also state that the data sets are so large that conventional methods
of storing and processing the data will not work.
Sources of big data
Main sources of big data can be grouped under the headings of social (human), machine (sensor)
and transactional.
Social (human) – this source is becoming more and more relevant to organisations. This source
includes all social media posts, videos posted etc.
Machine (sensor) – this data comes from what can be measured by the equipment used.
Transactional – this comes from the transactions which are undertaken by the organisation. This
is perhaps the most traditional of the sources.
Characteristics of big data
The characteristics of big data, known as the 5Vs, are:
Volume
Variety
Velocity
Veracity
Value
These characteristics have been generally adopted as the essential qualities of big data.
Volume
, The volume of big data held by large companies such as Walmart (supermarkets), Apple and
EBay is measured in multiple petabytes. A typical disc on a personal computer (PC) holds a
gigabyte, so the big data depositories of these companies hold at least the data that could
typically be held on 1 million PCs, perhaps even 10 to 20 million PCs.
The scale of this is difficult to comprehend. It is probably more useful to consider the types of
data that large companies will typically store.
Retailers
Via loyalty cards being swiped at checkouts: details of all purchases you make, when, where,
how you pay, use of coupons.
Via websites: every product you have every looked at, every page you have visited, every
product you have ever bought.
Social media (such as Facebook and Twitter)
Friends and contacts, postings made, your location when postings are made, photographs (that
can be scanned for identification), any other data you might choose to reveal to the universe.
Mobile phone companies
Numbers you ring, texts you send (which can be automatically scanned for key words), every
location your phone has ever been whilst switched on (to an accuracy of a few metres), your
browsing habits and voice mails.
Internet providers and browser providers
Every site and every page you visit. Information about all downloads and all emails (again these
are routinely scanned to provide insights into your interests). Search terms which you enter.
Banking systems
Every receipt, payment, credit card information (amount, date, retailer, location), location of
ATM machines used.
Variety
Some of the variety of information can be seen from the examples listed above. In particular, the
following types of information are held:
Browsing activities: sites, pages visited, membership of sites, downloads, searches
Financial transactions
Interests
Buying habits
There are many definitions of the term ‘big data’ but most suggest something like the following:
'Extremely large collections of data (data sets) that may be analysed to reveal patterns, trends,
and associations, especially relating to human behaviour and interactions.'
In addition, many definitions also state that the data sets are so large that conventional methods
of storing and processing the data will not work.
Sources of big data
Main sources of big data can be grouped under the headings of social (human), machine (sensor)
and transactional.
Social (human) – this source is becoming more and more relevant to organisations. This source
includes all social media posts, videos posted etc.
Machine (sensor) – this data comes from what can be measured by the equipment used.
Transactional – this comes from the transactions which are undertaken by the organisation. This
is perhaps the most traditional of the sources.
Characteristics of big data
The characteristics of big data, known as the 5Vs, are:
Volume
Variety
Velocity
Veracity
Value
These characteristics have been generally adopted as the essential qualities of big data.
Volume
, The volume of big data held by large companies such as Walmart (supermarkets), Apple and
EBay is measured in multiple petabytes. A typical disc on a personal computer (PC) holds a
gigabyte, so the big data depositories of these companies hold at least the data that could
typically be held on 1 million PCs, perhaps even 10 to 20 million PCs.
The scale of this is difficult to comprehend. It is probably more useful to consider the types of
data that large companies will typically store.
Retailers
Via loyalty cards being swiped at checkouts: details of all purchases you make, when, where,
how you pay, use of coupons.
Via websites: every product you have every looked at, every page you have visited, every
product you have ever bought.
Social media (such as Facebook and Twitter)
Friends and contacts, postings made, your location when postings are made, photographs (that
can be scanned for identification), any other data you might choose to reveal to the universe.
Mobile phone companies
Numbers you ring, texts you send (which can be automatically scanned for key words), every
location your phone has ever been whilst switched on (to an accuracy of a few metres), your
browsing habits and voice mails.
Internet providers and browser providers
Every site and every page you visit. Information about all downloads and all emails (again these
are routinely scanned to provide insights into your interests). Search terms which you enter.
Banking systems
Every receipt, payment, credit card information (amount, date, retailer, location), location of
ATM machines used.
Variety
Some of the variety of information can be seen from the examples listed above. In particular, the
following types of information are held:
Browsing activities: sites, pages visited, membership of sites, downloads, searches
Financial transactions
Interests
Buying habits