Download a full-length movie on your computer and you'll need about 700 megabytes of space. Now multiply that digital mass by 2.7 trillion.
That's roughly how much digital data the world will spit out this year -- a mind-boggling pile of real-time information, which, when processed and analyzed, holds the key to faster service at retailers, better customer care at banks and speedier diagnosis when you visit the doctor.
Market research firm IDC, which pegged 2008 global digital output at 180 exabytes -- 1 exabyte is a little over 1 billion gigabytes -- expects that to jump 10 times this year and reach 35,000 exabytes by 2020.
The tech industry calls it Big Data.
Think of a stack of DVDs reaching from earth to the moon and back, said David Thompson, Chief Information Officer at Symantec, a firm that manages, searches and secures large data sets.
Every phone call made, web page clicked, e-mail sent and online game played leaves data crumbs. These are now being analyzed in real-time to predict trends, spot potential problems and help find quick solutions.
Tibco Software and Informatica, bigger names including Salesforce.com Inc and IBM, and private firms such as Cloudera and Splunk are looking to cash in on this data tsunami -- betting on the potential returns from scouring and analyzing the data, and so helping customers improve their businesses.
Traditionally, data is stored in vast data warehouses, formatted into something the database can understand, and analyzed at the end of the week or month, if at all.
Because of the volumes and the ways in which data is generated today, centralized database systems are no longer adequate, said Cloudera CEO Mike Olson.
Computers are getting faster at a pretty good pace, but data is getting big at a much faster pace.
Businesses, seeking a competitive edge, are no longer satisfied with hindsight analysis. They need swift on-the-go insight to learn more, faster, and to pre-empt problems. And they're turning to companies that can offer that.
The promise and scope of Big Data is that within all that data lies the answer to just about everything, Tibco CEO Vivek Ranadivé told Reuters.
Every month, Reliance Communications, India's No.2 mobile carrier by subscribers, and a Tibco client, adds some 3 million customers, but loses 1 million.
Tibco's analysis of available data found that for every 6 calls dropped by Customer Care, one customer switched to a rival service provider. Armed with that, Reliance offered free text messages to those customers before they switched, and saw a dramatic drop in churn, Ranadivé said.
Customer engagement is tracked very closely and, by applying sophisticated analytics, companies can see what aspects of a product customers like and what they don't, BofA Merrill Lynch said in a recent note.
Tibco's revenue jumped 21 percent in fiscal 2010, to $754 million, and it increased its R&D spend by 15 percent. Its shares have risen 5-fold in 2 years to their highest since the dotcom bubble a decade ago, but have dropped 16 percent in the last 4 weeks as markets have broadly been sold off.
VOLUME, VELOCITY, VARIETY
Big Data is more than just volume. It's also about velocity of information and variety of information, said James Markarian, Chief Technology Officer at Informatica.
His company sees the concept as the coming together of three trends -- a deluge of transaction data such as orders, payments, storage and travel records; interaction data from social media sites; and warehousing to handle and store the data.
Informatica's products help integrate data in various formats from different data sources, making it easier to get into, and search, masses of data.
The soaraway popularity of smartphones and mobile devices, legions of newcomers to the Internet, rapid growth in online shopping and the spread of Facebook and Twitter are big drivers of Big Data.
Maybe I'm collecting tweets and doing sentiment analysis about my brand; maybe I want to understand what users do when they visit my site, where do they go? what do they look at? what links do they click? said Olson at Cloudera, which builds software applications on clients' databases.
Healthcare, too, is undergoing massive digitalization, as medical records are created and stored, and medical imaging is transferred to digital form in databases.
Last quarter, non-profit U.S. healthcare provider Oschner used Informatica to improve the quality of data in its legacy system, migrate that to a new system, and protect confidential medical records.
The potential value from real-time data mining in the U.S. healthcare sector alone could be worth more than $300 billion a year, much of that just from cutting costs by about 8 percent, according to a report by market researcher McKinsey.
U.S. oil company Chevron Corp accumulates some 2 terabytes of data a day -- 1 terabyte is 1,024 gigabytes -- while the Large Hadron Collider, the world's largest particle accelerator, can generate data at 40 terabytes a second, the McKinsey report said.
Informatica increased its R&D spend last year by more than a third to over $100 million, and its revenue rose 31 percent. Its shares have also risen strongly and last week hit a life high of $59.98.
PEERING INTO DARK CAVES?
Some of the most valuable data is also some of the most challenging to take advantage of, and that's called machine data, said Steve Sommer, Chief Marketing Officer at Splunk, a machine data management company inspired by spelunking, a U.S. term for caving, or potholing.
Sommer said much of this data can't be analyzed through traditional approaches as it doesn't have a standardized format -- for example, the many different types of Facebook posts.
Customers want to be able to constantly add, work with and understand terabytes of data in real time, he noted.
And they need the flexibility to add, at a moment's notice, any format of data, and they want a system that can immediately understand the data and come up with insights for them.
Splunk, which has Salesforce.com, Facebook and the newly public LinkedIn Corp among its clients, likes to think it's the 'Google for data' -- looking in customers' hardware, databases, servers and networks to understand the infrastructure and see how this can be parsed to help the business.
McKinsey has warned of a shortage of analytical talent to make the most of Big Data, saying the United States alone needs 140,000-190,000 people with the necessary analytical skills.
Big Data will become a larger liability instead of an asset for most organizations, said Symantec's Thompson.
Data needs to be handled intelligently and this is an opportunity for IT departments to really get their head around what type of data they have, what they retain long term and what's of true value to the corporation.
With the useful life of data dropping dramatically, so is the amount of time to do something about it.
The answer is not to try to build the mother of all databases. The answer is to extract the right data at the right time and put it in the right context, said Tibco's Ranadivé.
You don't need Big Data if you have the right little data.
(Reporting by Sayantani Ghosh in Bangalore, Editing by Ian Geoghegan)