Big Data | Nick Beim

The Barbell Effect of Machine Learning

June 3

If there is one technology that promises to change the world more than any other over the next several decades, it is arguably machine learning. By enabling computers to learn certain things more efficiently than humans and discover certain things that humans cannot, machine learning promises to bring increasing intelligence to software everywhere and enable computers to develop ever new capabilities – from driving cars to diagnosing disease – that were previously thought impossible.

While most of the core algorithms that drive machine learning have been around for decades, what has magnified its promise so dramatically in recent years is the extraordinary growth of the two fuels that power these algorithms – data and computing power. Both continue to grow at exponential rates, suggesting that machine learning is at the beginning of a very long and productive run.

As revolutionary as machine learning will be, its impact will be highly asymmetric. While most machine learning algorithms, libraries and tools are in the public domain and computing power is a widely available commodity, data ownership is highly concentrated.

This means that machine learning will likely have a profound barbell effect on the technology landscape. On one hand, it will democratize basic intelligence through the commoditization and diffusion of services such as image recognition and translation into software broadly. On the other, it will concentrate higher-order intelligence in the hands of a relatively small number of incumbents that control the lion’s share of their industry’s data.

For startups seeking to take advantage of the machine learning revolution, this barbell effect is a helpful lens to look for the biggest business opportunities. While there will be many new kinds of startups that machine learning will enable, the most promising will likely cluster around the incumbent end of the barbell.

Dataminr and the Science of Real-Time Information Discovery

March 17

Today Dataminr announced a $130m round of financing from a group of leading financial institutions and prominent financial thought leaders including John Mack, Vikram Pandit, Tom Glocer and Noam Gottesman.

A number of friends have asked me about the company and what I find most interesting about it. This seemed like a good opportunity to highlight a few thoughts.

What I find most interesting about Dataminr is that in addition to building a business, it is pioneering a new science. The science is real-time information discovery, and it involves sifting through the ever-growing tidal wave of real-time public data to identify and determine the significance of breaking events by their nascent digital signatures, as they happen. Sometimes these events are well-wrapped, for example by someone witnessing an event and tweeting about it, with others providing corroboration. Sometimes they aren’t, with algorithms figuring out what is happening by seeing thousands of facets of something larger. The company has a deep strategic partnership with Twitter that makes this kind of discovery possible.

This new science is, without a doubt, very cool. It enables one to discover news before it’s news and market-moving information before markets move. It provides a kind of X-ray vision into what is going on in the world in real-time with a filter for what is significant, and to whom. All on the basis of publicly available data.

In a period of five months, Dataminr has become the real-time wire service used almost universally by major news organizations, beating out the next best service by over an hour and discovering troves of unknown unknowns that would never have otherwise come to light. It has become adopted by the lion’s share of leading financial institutions to have access to the frontier of breaking information in real time.

What’s also interesting is how Dataminr will change the world. In my view most industries that rely on real-time information — an ever-increasing number — will be influenced by it, and some will be transformed by it. The wave of change began in the fields of finance, news and public safety, and I think will move quickly to risk management, security and PR. And undoubtedly to other verticals in ways that are difficult to predict. I am particularly excited about what the company and its technology can do to help save lives in the fields of public safety and humanitarian assistance.

Dataminr is in the early days of a long journey, but it is already impacting the world in significant ways, and it’s exciting to be a part of.

The Big Data Revolution in News

January 30

Big Data

Yesterday Dataminr, a big data startup based in New York, announced something pretty extraordinary: that it would become the news discovery platform for CNN. This seems like one of those watershed moments in the history of the news industry that could change the industry’s dynamics fundamentally, like the advent of news agencies or the launch of CNN itself.

How can a technology startup become the news discovery platform for the world’s leading news organization? Because today, breaking events typically leave discoverable digital signatures before they become news, and Dataminr discovers these signatures as soon as they become algorithmically recognizable.

Most of these signatures are on Twitter, since Twitter has become the natural place that hundreds of millions of people post things they deem interesting, important, surprising, funny, scary, scandalous, or otherwise worth sharing – anything, in short, they deem newsworthy. No matter how effective any company’s news-gathering organization, it simply can’t beat the scale of this discovery system.

Most interestingly, Dataminr algorithmically discovers, qualifies, categorizes and communicates breaking events in real time. As they happen. This is an extremely difficult technological feat to pull off. There are half a billion to a billion tweets per day, and Dataminr’s algorithms process this stream of data and associated metadata in real-time to discover even the smallest micro-events as they happen and determine their significance, relevance and actionability.

How Well Does It Work?
How well does this work? In short, very well, both because there is so much signal on Twitter and because Dataminr has developed and honed its algorithms with an outstanding team of data scientists over the past three years.

One particularly memorable example of the kind of event discovery Dataminr excels at is the assassination of Osama bin Laden. Dataminr’s algorithms discovered the news on the basis of 19 tweets in a 5-minute period on May 1, 2011. The algorithms used signal pattern recognition, linguistic analysis, sentiment classification and cross-referencing with third-party data sources to identify the news. Dataminr alerted its clients of the news at 10:20pm. At 10:24pm, Keith Urbahn, the former Chief of Staff to Defense Secretary Donald Rumsfeld (not the country music singer), provided partial confirmation in his own tweet: “So I’m told by a reputable person they have killed Osama bin Laden. Hot damn.” The first move in S&P Futures caused by the news occurred at 10:39pm, and Bloomberg and the New York Times began reporting the news at 10:43pm. Quickly the news spiraled into one of the most viral events in Twitter’s history, with messages increasing from 19 in a 5-minute period to 20,000 per minute 30 minutes later.

Through its use of very sophisticated event discovery technology, Dataminr beat major news sources to the punch by 23 minutes on the biggest story of the year, and one of the biggest of the decade. Pretty cool stuff.