The idea to provide the latest in any format of news, also known as ‘horserace journalism’ is probably as old as the trade itself. With the advent of news online, the margin of release of news has only grown smaller – right down to the milliseconds.

“Not only are reporters always up against deadlines, but they are constantly scrambling to make sure they break the news before the competition. Even if they hit the newswires a mere split-second before rival news services, it matters. It’s bragging rights, if nothing else.,” writes ‘Big Data Evangelist’ James Kobielus on his blog.

In such a scenario how does the final consumer – the reader – decide as to whose scoop is the freshest, so to speak. Kobielus enunciates how he stumbled across 2013 blog: “How to spot first stories on Twitter using Storm”, wherein the Twitter user describes methods of detecting first stories “as they happen” using Twitter on top of Storm.

The blog summarizes the research project that author Michael Vogiatzis conducted in gaining his advanced degree in computer science, rather than as a “scoop certification” tool for use by an online news service (though, clearly, it could serve that purpose). He describes in detail, with programming code, diagrams and equations, no less, how he built a program that does “first story detection” on Twitter’s streaming Storm infrastructure. “Specifically,” says Vogiatzis, “I try to identify the first document in a stream of documents, which discusses about a specific event.”

The blog provides all types of technical data that one might need to understand the intricacies of the process Vogiatzis describes. Other stream computing platforms like IBM InfoSphere Streams can also be used.

The lack of precision in determining the originality of news online has far reaching overtones wherein the consumer remains uninterested in knowing the true source as only the news finds the final attention thus allowing no one news organisation the pedestal for excellence.

“If the scoop culture endures in the era of real-time streaming journalism, who can have legitimate bragging rights over “first-to-tweet”? Will news outlets start bragging about their real-time scoop analytics tools?”, notes Kobielus.

Read more here.
(Image credit: Kevin Harber)

Previous post

EBay Brings Out Open-Sourced Kylin, Built from Scratch, Which Taps Distributed Processing and the HBase Data Store to Accelerate Analytics

Next post

365 Data Centers Receives $16m in Fresh Financing to Fuel Cloud & Managed Services Expansion