Data-driven workflows are the life and existence of big data professionals everywhere: data scientists, data analysts, and data engineers. We perform all types of data functions in these workflow processes: archive, discover, access, visualize, mine, manipulate, fuse, integrate, transform, feed models, learn models, validate models, deploy models, etc. It is a dizzying day’s work. We start manually in our workflow development, identifying what needs to happen at each stage of the process, what data are needed, when they are needed, where data needs to be staged, what are the inputs and outputs, and more. If we are really good, we can improve our efficiency in performing these workflows manually, but not substantially. A better path to success is to employ a workflow platform that is scalable (to larger data), extensible (to more tasks), more efficient (shorter time-to-solution), more effective (better solutions), adaptable (to different user skill levels and to different business requirements), comprehensive (providing a wide scope of functionality), and automated (to break the time barrier of manual workflow activities).
A workflow platform that performs a few of those data functions for a specific application is nothing new – you can find solutions that deliver workflows for business intelligence reporting, or analytic processing, or real-time monitoring, or exploratory data analysis, or for predictive analytics deployments. However, when you find a unified big data orchestration platform that can do all of those things – that brings all the rivers of data into one confluence (like the confluence of the Allegheny and Monongahela Rivers that merge to form the Ohio River in the eastern United States) – then you have a powerful enterprise-level big data orchestration capability for numerous applications, users, requirements, and data functions. The good news is that there is a company that offers such a platform: Apervi is that company, and Conflux is that confluence.
From Apervi’s comprehensive collection of product documentation, you learn about all of the features and benefits of their Conflux product. For example, the system has several components: Designer, Monitor, Dashboard, Explorer, Scheduler, and Connector Pack.
- The Conflux Designer is an intuitive HTML5 user interface for designing, building, and deploying workflows, using simple drag-and-drop interactivity. Workflows can be shared with other users across the business.
- The Conflux Monitor keeps track of job progress, with key statistics available in real-time, from any device, any browser, anywhere. Drilldown capabilities empower exploratory analysis of any job, enabling rapid response and troubleshooting.
- The Conflux Dashboard provides rich visibility into KPIs and job stats, on a fully customizable screen that includes a variety user-configurable alert and notification widgets. The extensible dashboard framework can also integrate custom dashboard widgets.
- The Conflux Explorer puts search, discovery, and navigation powers into the hands of the data scientist, enabling that functionality across multiple data sources simultaneously. A mapping editor allows the user to locate and extract the relevant, valuable, and interesting information nuggets within targeted data streams.
- The Conflux Scheduler is a flexible, intuitive scheduling and execution tool, which is extensible and can be integrated with third party products.
- The Conflux Connector Pact is perhaps the single most important piece of the workflow puzzle: it efficiently integrates and connects data that are streaming from many disparate heterogeneous sources. Apervi provides several prebuilt connectors for specific industry segments, such as Telecom, Healthcare, and Electronic Data Interchange (EDI).
Apervi provides detailed white papers, datasheets, product documentation, case studies, and infographics on their website at http://www.apervi.com/. For more immediate gratification, if you are at the O’Reilly Strata Conference in New York City this week, Apervi is demo’ing their Conflux product there. You can find them and check them out at booth P6. (I hear that they also have some exciting giveaways, so it is definitely worth drilling down into this product line.)
Kirk is a data scientist, top big data influencer and professor of astrophysics and computational science at George Mason University.He spent nearly 20 years supporting NASA projects, including NASA’s Hubble Space Telescope as Data Archive Project Scientist, NASA’s Astronomy Data Center, and NASA’s Space Science Data Operations Office. He has extensive experience in large scientific databases and information systems, including expertise in scientific data mining. He is currently working on the design and development of the proposed Large Synoptic Survey Telescope (LSST), for which he is contributing in the areas of science data management, informatics and statistical science research, galaxies research, and education and public outreach. His writing and data reflections can be found at Rocket-Powered Data Science.
(Image credit: Flickr)