Data science revolutionaries Netflix have announced the onset of a Netflix-OSS project called Surus, through which it will open source some of its Hadoop data analyzing tools based on “internal user defined functions (UDF’s) that have broad adoption across Netflix.”
The UDFs are slated for release over the next year and have varied use scenarios like scoring predictive models, outlier detection, pattern matching, etc. with an overall aim to enhance big data analytical capabilities, explains the blog announcing the release.
Furthermore, as part of Surus, the first function Netflix offers, is dubbed ScorePMML, and will enable efficient scoring of predictive models in Apache Pig using Predictive Modeling Markup Language.
“ScorePMML aligns Netflix predictive modeling capabilities around the open-source PMML standard,” Netflix explains. “By using the same PMML representation of the predictive model at each step in the modeling process, we save time/money by reducing both the risk and cost of custom code.”
Data scientists at Netflix have begun ScorePMML adoption as it enables effective iteration and deployment of models in contrast to the legacy approach.
Read more here.
(Image credit: Seth Anderson, via Flickr)