Melanie Mueller, Data Scientist from Harvard University, gives us an overview of her visit to Big Data TechCon conference in Boston. Melanie attended the conference from March 31 to April 2.
Big Data practitioners converged on Boston to attend Big Data TechCon , a Big Data training conference organized by BZ media. Topics covered both tools to deal with Big Data, such as current database and parallelization solutions, as well as analytics to derive insight from it.
For tools, hands-on tutorials were particular popular and often overcrowded, revealing the most coveted techniques. Classes to analyze social media streams were filled to the last chair, and a one-day crash course on Hadoop was booked out weeks before the conference.
In his tool-focused keynote, Sunil Venkayala from HP Vertica talked about Distributed R, an open source software that is still under development. Distributed R marries R and Hadoop, and promises to allow analysis of data that is too large for vanilla R. To this end, the Vertica team rewrites R routines to provide scalable high-performance on multiple nodes for distributed processing, while allowing users to use familiar GUIs and packages from R.
 Todd Cioffi from RapidMiner raised the question: Why every time we get a new data problem we start coding? He emphasized that we shouldn’t confuse the tools with the process, and that we need to focus on the questions rather than the tools in order to gain valuable insights. To ask the right questions, we need to move on from traditional business intelligence with its query and dashboard based reporting to modern advanced analytics with descriptive and predictive modeling. For example, a business intelligence question such as ‘Which packages are on which truck?’ should be replaced by an advanced analytics questions like ‘How can I optimally assign packages to trucks to minimize delivery time and usage of trucks?’
Scott Sokoloff from TEL and Will Ford from Alpine Data Labs illustrated common pitfalls when trying to leverage Big Data for business decisions. They emphasized that it is not enough to just follow the pattern sin the data and find actionable insights. The insights must also align with the company interests, and they must be actionable in the company’s particular environment. For example, proposing a strategy that will bring down product prices sounds good – but it might not go well with the sales team if their bonuses are based on the total dollar amount sold. Scott Sokoloff phrased his take home message as: Big data analytics is not about who is the smartest, it’s a relationship business.
Apart from the presentations, Big Data TechCon allowed for ample networking opportunities between the attendees– indeed, Big Data is a relationship business!
Melanie is a postdoc at Harvard University, where she is creating and munching data from biological experiments. Most of the time she she is trying to figure out what yeast cells have done while growing on a Petri dish. It turns out that automated data analysis with Matlab and Python can help a lot in this process! Before moving to Harvard University, Melanie obtained a PhD in Physics from the Max Planck Institute of Colloids and Interfaces in Potsdam, Germany, where she used mathematical modeling, computer simulations and analysis of experimental data to understand how molecular motors transport cargoes in cells.