Data Science and Machine Learning Platforms
Data, IT and E-com teams can use to source data, build models, and operationalize machine learning. It will help C Levels make the right choice for their business from a crowded field in a maturing DSML platform market that continues to show rapid product development.
Machine learning (DSML) platform is a core product and supporting portfolio of coherently integrated products, components, libraries, and frameworks (including proprietary, partner-sourced and open-source). Its primary users are data science professionals, including expert data scientists, citizen data scientists, data engineers, application developers and machine learning (ML) specialists. The core product and supporting portfolio:
Are sufficiently well-integrated to provide a consistent “look and feel.”
Create a user experience in which all components are reasonably interoperable in support of an analytics pipeline.
The DSML platform:
Offers a mixture of basic and advanced functionality essential for building DSML solutions (primarily predictive and prescriptive models).
Supports the incorporation of these solutions into business processes, surrounding infrastructure, products, and applications.
Supports the sustainable consumption of insights derived from the platform and offers functionality to quantify and track the value of data science projects.
Supports variously skilled data science professionals (“data scientist” is an inconsistently applied job title and professional distinction) — a DSML platform’s user base is often made up of professionals with diverse technical and business backgrounds.
Supports multiple tasks across the data science life cycle, including:
Problem and business context understanding
Model creation and training
Data and model governance
Explainable artificial intelligence (XAI)
Business value tracking
BEST CLOUD PROVIDER
Figure 1: Best Cloud Provider for Data Science and Machine Learning Platforms
Vendor Strengths and Cautions
Alibaba Cloud is a Niche Player in this BEST CLOUD PROVIDER LIST. It provides two software products that together make up its core DSML platform: Platform for AI (PAI) Studio and Data Science Workshop.
In addition to China, Alibaba has a strong customer base in the South and Southeast Asian markets, but it does not have many clients elsewhere. Its current platform focuses on applications for the retail, internet and data service sectors.
Alibaba Cloud’s product and roadmap are well-suited to expert data scientists and data engineers in sectors such as internet technology, data services, retail and government. Alibaba Cloud emphasizes support for augmentation of certain tasks in the DSML workflow, but its platform lacks functionality and ease of use for citizen data scientists, which slows its adoption by less mature organizations.
Strong community built in China: Alibaba Cloud showcases its community’s strength with the Tianchi platform, a Kaggle-like platform for collaboration, competition and knowledge sharing. The platform is widely adopted within the Chinese market.
Advanced use-case modeling: Alibaba provides strong solutions for advanced use cases such as image labeling, image recognition and segmentation, and recommendation engines, which can be useful to expert data scientists.
Seamless integration that creates coherence: Alibaba provides a coherent platform that integrates well with its other offerings for data preparation, exploration, ML, augmentation and delivery. It offers drag-and-drop interactive modeling features across its platforms, which can be used by expert data scientists to support the ML pipeline.
Geographic strategy: Although Alibaba Cloud has offices and service locations in many countries, the clients it serves are mostly in Asia/Pacific. Prospective customers should ensure they are satisfied with the vendor’s presence and support in their region.
Product vision: Given the current pace of development by other vendors, Alibaba Cloud will have to be swift and agile, as this market is likely to remain highly competitive. Some key themes of, and items on, its product roadmap are already available as standard features from many other vendors.
Narrow usage and lack of citizen data scientist support: The current PAI Studio and Data Science Workshop offerings offer limited ML and advanced analytics capabilities, such as agent-based modeling, discrete-event modeling, Monte Carlo simulation, support for generative adversarial networks and self-supervised learning. Currently, the platform is suitable for advanced users but may not be a good choice for citizen data scientists or business analysts.
Altair is a Niche Player in this Best Cloud Provider List. It offers a suite of products called Altair Knowledge Works, and the core product considered in this Best Cloud Provider evaluation is Altair Knowledge Studio. The Knowledge Works suite also includes Knowledge Studio for Apache Spark, Knowledge Hub, Panopticon and Monarch.
Altair’s operations are geographically diversified, and the vendor maintains strong offerings for service-centric industries (particularly banking and financial services). It also offers various simulation and high-performance computing solutions that appeal to customers in the automotive, aerospace and manufacturing sectors and to other asset-based organizations.
Knowledge Studio’s capabilities for automated ML (AutoML), XAI and enhanced open-source integration have strengthened, but it is still catching up with other vendors’ products in terms of native capabilities for delivery and deployment and model management.
Ease of use: Altair Knowledge Studio offers an intuitive, easy-to-use interface for both coders and noncoders. Additionally, by exposing and allowing editing of the underlying open-source code, it enables expert data scientists and data engineers to augment the platform’s standard functionality.
Tools for building strong data pipelines: Altair Knowledge Studio offers strong capabilities for augmented data preparation. Integration with advanced data preparation tools (Monarch and Knowledge Hub) enables semistructured data to be easily extracted and included in ML modeling. Knowledge Hub also offers strong data governance and metadata management capabilities.
Operations and customer experience: Altair customers report high satisfaction with the vendor’s operations, including in the areas of deployment, service and support. Altair has a team of product design experts who have a keen understanding of customer needs and processes for simulation-based design activities.
Gaps in current offering: Altair has added new features in areas such as augmented DSML, MLOps and XAI. However, other advanced analytics and delivery capabilities are weaknesses for this vendor. Altair also needs to strengthen its decision modeling and composite artificial intelligence (AI) capabilities.
Limited resonance across industries: Although Altair has a strong focus on both service- and asset-centric industries, its product marketing needs improvement to resonate with a wider set of client needs. Existing and prospective customers need to work with the vendor to understand its full suite of products, which are applicable to a range of use cases.
Comparatively slow growth: Adoption of Altair’s core Knowledge Works products has been slow, compared with competitors’ offerings. Several competitors are sustaining extremely strong growth and offering market-leading products.
Alteryx is a Challenger in this Best Cloud Provider. It has repositioned its offering by introducing Analytics Process Automation (APA) technology to provide building blocks for automating the analytics process and integrating with applications and robotic process automation (RPA). The platform includes Alteryx Designer, Alteryx Intelligence Suite, Alteryx Server, Alteryx Connect and Alteryx Promote. Alteryx Analytics Hub provides an environment for workflow automation and scheduling, collaboration, multitenancy and data connection management.
Alteryx’s operations are geographically diversified, and this vendor has clients in most domains and industries. Top verticals include manufacturing, financial services, consumer packaged goods, retail, healthcare and government.
Alteryx’s broad revamping is a work in progress. The newly introduced Alteryx Analytics Hub provides a centralized approach to orchestrating workflow and collaboration when managing analytics and data connection environments.
Ease of use for diverse personas: A collaborative user experience leveraging code-free and expert modes contributes to ease of use by all personas. Alteryx also provides line-of-business (LOB) and industry solution templates and jump-start kits to accelerate onboarding and use.
Go-to-market strategy: With APA, Alteryx emphasizes the creation of analytic content and progression from insight to action. Strong channel and independent software vendor partnerships and a verticalized go-to-market strategy, including Alteryx-developed and joint partner-developed solutions, create momentum and increase visibility.
Customer experience and operational support: Alteryx has consistently delivered excellent functionality and support, judging by feedback from customers. Customers generally respond very positively when asked about their overall experience with Alteryx.
Changing product portfolio: With the introduction of APA, Alteryx is making many changes to its portfolio. Customers should seek clarification and verify that the evolving APA framework is a good fit for their DSML strategy and users.
Perceived high cost: Pricing is commonly identified as a concern by Alteryx customers. They report good value for money, but often also evaluate less costly alternatives as their data science initiatives develop.
Innovation: Although Alteryx has delivered some good innovation with RPA integration, augmentation and a multipersona approach, other vendors are leading the way in terms of cutting-edge ML and key areas such as streaming, the Internet of Things (IoT) and XAI.
Amazon Web Services -AWS
Amazon Web Services (AWS) is a Visionary in this Best Cloud Provider. Its vision is for data science teams to use the entire breadth of the AWS portfolio and ML stack, with Amazon SageMaker at its core. Many of the supporting AWS components and services were considered in evaluating AWS’s offering. These included the SageMaker Studio IDE (which includes Autopilot, Notebooks, Model Monitor, Experiments and Debugger), Amazon EMR (including S3), AWS Glue, Amazon SageMaker Neo, Amazon SageMaker Ground Truth, Amazon SageMaker Clarify, Amazon SageMaker Data Wrangler, Amazon SageMaker Pipelines, AWS CloudWatch, AWS CloudTrail and others.
AWS is geographically diversified, and its client base spans many industries and business functions.
Amazon SageMaker continues to demonstrate formidable market traction, with a powerful ecosystem and considerable resources behind it.
Breadth and depth of cloud platform: Users can directly leverage AWS’s prepackaged AI services (such as Amazon Lex, Polly and Transcribe). SageMaker is also natively integrated with AWS’s many cloud data and analytics tools. Additionally, SageMaker provides extensive support for a broad range of popular and niche open-source software (OSS) libraries and frameworks.
Performance, scalability and granularity of control: Amazon SageMaker and its supporting portfolio offer best-in-class performance and scalability. The platform supports a significant selection of hardware options optimized for various ML and deep learning frameworks, and features a pay-as-you-go pricing model with no minimum fees or upfront commitment, thus encouraging experimentation.
Data labeling and human-in-the-loop capabilities: Amazon SageMaker Ground Truth supports labeling of training data, and Amazon’s Augmented AI (Amazon A2I) helps build optimal workflows for human review of deployed models. AWS connects customers with third-party marketplace vendors and the Amazon Mechanical Turk (MTurk) workforce for human labeling of data.
Evolving citizen data science appeal: AWS has made its platform more accessible, mainly through Autopilot, Data Wrangler, Pipelines and continued development of the SageMaker Studio IDE. Still, the platform is more popular among coders — it is not as intuitive for nontechnical users, compared with leading tools for citizen data scientists.
Rapid pace of development needed to match competitors’ functionality: AWS’s flurry of new components and services is filling important gaps in its platform. However, these new capabilities are neither as proven nor as strong as other vendors’ capabilities for data preparation, user interfaces, collaboration and coherence.
Maturing on-premises, hybrid and multicloud support: The majority of Amazon SageMaker customers operate in purely cloud environments. Some capabilities within the AWS portfolio change or become more complicated in hybrid, multicloud or on-premises environments. Multicloud support is evolving, however, and today most customers manage data, models and ML workloads within AWS.
Anaconda is a Niche Player in this Best Cloud Provider. It offers Anaconda Enterprise, a data science development environment based on the interactive notebook concept that supports use of open-source Python and R-based packages. (This evaluation excludes the Anaconda Individual Edition, formerly known as Anaconda Distribution Version.)
Anaconda is geographically diversified. The majority of its users are in the financial services sector, but it is also used in sectors such as energy and utilities, healthcare, manufacturing and retail.
Anaconda has made noteworthy innovations in the areas of model governance and scalability. It has partnerships with vendors such as Google, IBM and Microsoft to drive DSML innovation with the use of open-source technologies.
Trusted and flexible platform: Anaconda offers a popular and trusted platform within the coding community, one with options for both beginners and experts. The GUI is intuitive, gives access to all R and Python libraries, and offers users the flexibility to work on several IDEs of choice, including Jupyter and RStudio.
Optimization of open-source technologies: To optimize open-source technologies and support scalability, Anaconda provides upscaling options using GPUs, managed within the Anaconda environment. Users can also use Apache Hadoop, Apache Hadoop YARN and Kubernetes clusters, on-premises or in the cloud.
Culture of collaboration and accompanying features: The Anaconda community supports the Python open-source contributions, thus fostering a culture of code integrity and integration with other open-source data science projects. Anaconda Cloud, for data scientists, provides ways to collaborate, share deployments and exchange code libraries It also enables developers to explore and accelerate model development and deployment.
Focus on technical audience: Anaconda targets a technical audience that prefers to code in R or Python languages for data science. The platform lacks features that enable citizen data scientists to take advantage of it.
Lack of some critical model operationalization capabilities: Anaconda’s platform lacks model management capabilities such as dependency management, explainability and bias detection, as well as model inventory features. Anaconda does, however, provide some model-monitoring and governance features, such as scheduling of deployments, user information and resource consumption for ML models (via the scheduler UI).
Stability: Anaconda users highlight compatibility and runtime issues with the platform. Nonexpert users often find it challenging to keep their projects coherent when new platform or package updates are released.
Cloudera is a Niche Player in this Best Cloud Provider. It has a core ML product, Cloudera Machine Learning (CML), supported by Cloudera Data Engineering (CDE) and Cloudera Data Visualization (CDV). These products are interconnected and delivered as services on top of the Cloudera Data Platform (CDP). CML has replaced and extended Cloudera’s previous on-premises DSML platform, Cloudera Data Science Workbench (CDSW), to provide hybrid and multicloud capabilities.
Cloudera is geographically diversified, and its client base spans many industries and various business functions.
Cloudera’s heritage as a big data company is reflected in its ML offering being part of the CDP. The vendor’s vision focuses on unifying ML workflows across data warehousing, data engineering, DSML and operationalization.
Native use of Spark on Kubernetes: With CDE and CML, Cloudera aims to overcome the overhead associated with managing Spark clusters and dependencies by maintaining containerized, repeatable workflows that can be scaled on demand. CML enables data science teams to use a variety of ML runtimes without prescribing underlying frameworks.
Processing of complex data workloads: CDP is designed for creating and managing high-volume data integration and preparation processes across hybrid and multicloud environments. CML and CDE are part of CDP, and thus provide control over data processing infrastructure and ML execution environments from a single platform.
Metadata management for DataOps and MLOps: The central framework that enables the building of scalable and repeatable DSML pipelines is Cloudera’s Shared Data Experience (SDX), based on Apache Atlas, which stores metadata on each step of execution. An MLOps SDK enables programmatic interaction with SDX.
Code-first focus: The majority of DSML tasks undertaken in CML require coding and use of open-source libraries in Python, R, Scala and similar languages with no visual workflow interface. There is little augmentation in the platform to help citizen data scientists build their own models.
Coherence of product offerings: CDP is the platform on which CML and CDE are offered. CDP can also include Cloudera Data Hub (CDH), Cloudera Data Warehouse, Cloudera Operational Database and Cloudera DataFlow. These services may be used to migrate on-premises deployments to the cloud. Even with centralized access to these components from CDP, the learning curve may be steep, even for experts.
Domain-specific solutions: The prototypes provided by Cloudera’s Fast Forward Labs in the form of Applied ML Prototypes (AMPs) are still small in number. The goal of taking cutting-edge ML from research and applying it to enterprise environments in a packaged form has great potential. However, organizations with limited in-house expertise will have to rely on Cloudera’s professional services.
Databricks is a Leader in this Best Cloud Provider. Its Unified Data Platform, available in multiple clouds and with an emphasis on scalability, spans data science, ML, analytics and data engineering.
Databricks is geographically diversified, and its client base spans many industries and various business functions.
The company is evolving beyond its perception as merely the leader of the Apache Spark community, as is reflected by the renaming of its Spark + AI Summits as Data + AI Summits. Databricks keeps contributing to the open-source community — for example, by leading the Delta Lake and MLflow projects. It has also extended its offering with the acquisition of Redash, which enables users to query and visualize data more easily, using SQL.
Multicloud performance at scale: Databricks enables its customers to experiment and train their models fast and then to scale them quickly. It offers automanaged and scalable CPU and GPU clusters on multiple cloud platforms, preconfigured with the most popular ML frameworks, with built-in optimizations. MLflow offers flexibility to deploy models to different cloud environments.
Empowerment of mature data scientists: Databricks’ notebook-centric vision and optimization of OSS appeals to expert data scientists who demand high performance and early access to the latest innovative ML technology. This appeal is enhanced by an extensive collection of training materials, other documentation and access to a large community of knowledgeable users.
Execution and expansion: Databricks has sustained strong revenue growth, catalyzed by its successful partnerships with Microsoft (Azure), AWS and hundreds of other organizations across the world. The company has a well-executed vertical sales strategy, with strong commitment to customer value creation.
Citizen data science support: Databricks still targets a mainly technical audience of data engineers and data scientists with a coding background. Its platform offers collaboration support and recently also gained new SQL analytics capabilities aimed at data analysts. However, the platform is not well-suited to citizen data scientists and other low/no-code users.
Governance and responsible AI: Databricks offers support for General Data Protection Regulation (GDPR) and California Consumer Privacy Act (CCPA) compliance, and has embedded open-source techniques for bias mitigation and explainability. However, the company’s vision and messaging should pay more explicit attention to the responsible, ethical and trustworthy use of DSML, including the need for governance, risk management and compliance with emerging laws, policies and guidelines.
Growing competition from cloud partners: Databricks has a cloud-first strategy, offering its customers a choice of a growing number of cloud platforms for scalable ML. However, the company faces growing competition from its key cloud partners (and others), which all have their own DSML offerings and visions that overlap with its.
Dataiku is a Leader in this Best Cloud Provider. Its core product is Data Science Studio (DSS), which provides one platform for all DSML tasks, with a focus on multidisciplinary data science teams, collaboration and ease of use.
Dataiku is geographically diversified, and its client base spans many industries and various business functions.
The company announced a Series D funding round of $100 million in August 2020. It has also formed partnerships with global system integrators and vendors including Tableau, Snowflake and UIPath. It has a strong roadmap and vision in the areas of responsible AI, collaboration and business applications, one that points to continued growth and innovation.
Understanding of citizen data scientists: Dataiku has added augmented functionality to every stage of the DSML cycle. Citizen data scientists are well-supported with everything from detailed information on data quality and profiling to guided control over AutoML and explainability features. Users who want to build models in a no-code manner have a wealth of tools at their disposal.
Focus on business value: Dataiku understands the need for performance metrics that go beyond model accuracy and provides the ability to create custom business metrics optimized to deliver a particular business benefit and to monitor concept drift. A newly formed professional services team that focuses on business value is also a testament to Dataiku’s vision.
Increasing market traction: Dataiku remains on an impressive growth trajectory. The company continues to expand its ecosystem of partners in order to build targeted industry and function-specific analytic solutions. This expansion includes an increase in its OEM and managed service provider (MSP) programs.
Heavy use of extensions and plugins: Deep learning support within DSS is primarily code-driven using Keras and TensorFlow, while deep learning support for visual users is achieved via an extension to DSS. Extensions are also needed for time-series processing, connecting to certain data sources, signal processing and mining. Navigating and installing these features increases platform management overhead and the complexity of containerized deployments.
Emerging vision for unifying XOps: Dataiku is working on better capabilities for full model management that can be used by multiple operations teams (those for data, ML, models and platforms). The personas the platform is built for will need to expand, as will associated functionality for deployment in complex heterogeneous environments.
Pricing model for smaller teams: DSS is available in several versions that have increasing levels of functionality. The prices of versions that do not offer full enterprise capabilities for model scalability and deployment are higher than those of other vendors’ offerings that do have these capabilities.
DataRobot is a Visionary in this Best Cloud Provider. The DataRobot Enterprise AI Platform consists of Paxata Data Preparation, Automated Machine Learning, Automated Time Series, MLOps and AI applications. Its augmented approach enables both citizen and expert data scientists to productively use data science.
DataRobot’s operations are geographically diversified. The vendor has a strong presence in the banking, insurance, other financial services, manufacturing, retail, life sciences and healthcare sectors.
DataRobot delivers trusted AI through its Humble AI initiative, which enables prediction quality management. The vendor prioritizes measuring business value with its Use Case Value Tracker, a centralized hub for managing ROI. A Series F funding round in November 2020 and additional investments in December raised $320 million.
Sales strategy and execution: DataRobot’s simplified pricing structure and self-service trial drives consistent addition of new customers and growth within its existing client base. In addition, a relationship with Boston Consulting Group (BCG) enables joint sales motion and co-development of industry-specific applications.
High-touch customer service: DataRobot’s AI Success Plan provides a proven structured delivery approach focused on providing business value. The Customer-Facing Data Science and AI Success teams, repeatable playbooks and user training accelerate delivery of a pipeline of prioritized use cases and help customers achieve value quickly.
Acquisitions and key developments to fill gaps and link to business value: The acquisition of Paxata, a company focused on data preparation, is the most recent example of DataRobot’s use of acquisitions to fill functionality gaps in its platform. Paxata and all previous acquisitions were quickly incorporated. AI applications and the Use Case Value Tracker help track the value achieved by models.
Increasing complexity of product portfolio: DataRobot’s offering now consists of multiple components, including those for data preparation, model development and MLOps. Understanding and using the various components, the capabilities included within each, and their interoperability becomes increasingly complex as the portfolio grows.
Resource-heavy onboarding: DataRobot’s AI Success Plan and Customer-Facing Data Science and AI Success teams provide an approach that some believe is difficult to scale. In addition, this approach must specifically address and demonstrate the ability to lead clients to self-sufficiency quickly. DataRobot’s new self-service program offers customers a simpler way to get started.
Capability gaps: Although it is strengthening its capabilities in areas such as model management and data access, DataRobot still has capability gaps in areas such as other advanced analytics, including decision modeling and precanned solutions.
Domino is a Niche Player in this Best Cloud Provider. Its core product is the Domino Data Science Platform, which is supported by Domino Model Monitor to provide end-to-end DSML capabilities in the cloud or on-premises.
Domino’s operations are primarily in North America and EMEA. The vendor has a significant presence in the banking, financial services, manufacturing and life sciences sectors, but its platform is used in most industries.
The introduction of Domino Model Monitor in 2020 shows this vendor’s clear commitment to enterprise MLOps. Domino’s market positioning is deliberate and R&D for its platform will remain focused on large, code-first data science teams.
Support for large, expert data science teams: Domino’s focus on large, code-first data science teams gives it a deep understanding of emerging enterprise needs and challenges. The platform is well liked by chief data and analytics officers — roles that many organizations will add or expand in the near future to set up systems that can orchestrate and govern an entire firm. Collaboration capabilities remain among the best.
MLOps capabilities and maturity of vision: Domino released Domino Model Monitor in 2020 to supplement the MLOps capabilities of its core platform. The product supports enterprise MLOps, where models are monitored across different teams, deployment infrastructures and languages. The platform’s highly auditable workflows enable rapid and targeted responses to deteriorating model health.
On-premises, hybrid and multicloud support for modern ML deployment: Domino Data Science Platform is the rare code-focused platform that offers a first-class experience both on-premises and in the cloud. Domino is among the best options for supporting complex hybrid and multicloud model development and deployment. Domino provides support for all major cloud providers’ versions of Kubernetes, and the vision for the platform’s future is cloud-agnostic.
Support for small and immature data science teams: An organization that does not plan to expand past 20 data scientists on its staff should not consider Domino. Typical Domino deployments involve more than 25 data scientists and ML engineers. The platform is designed to support a highly interactive community of data scientists, not just a few loosely connected individuals or people who depend on augmented approaches to ML.
Low visibility: Judging from client inquiry levels and Applant.com searches, interest in Domino is consistently low in this market. Additionally, requests for Domino experience appear in only a small number of data science job postings, compared with competing vendors.
Increasing flexibility and openness of most vendors: Domino helped transform the definition of DSML platforms into what it is today, with vendors providing curation and optimization of open-source technologies in addition to proprietary functionality. However, a highly flexible, OSS-fueled collaboration hub is now a common platform vision. Domino still receives a top score for flexibility and openness, but differentiation in this area is narrowing.
Google is a Visionary in this Best Cloud Provider. It offers the Google Cloud AI Platform as its core DSML platform. The platform has an expanded suite of components that includes Cloud Data Fusion, Cloud AutoML, BigQuery ML, AI Platform Notebooks and TensorFlow. Google will launch its unified AI Platform in the first quarter of 2021 (after the cut-off date for evaluation in this Best Cloud Provider). Key features and services that will be released with this new platform include AutoML tables, XAI, AI platform pipelines and other MLOps services.
Google is geographically diversified and its client base spans many industries and various business functions.
Google’s Completeness of Vision is boosted by thought leadership in ML research and responsible AI, as well as by the roadmap for its unified AI Platform. The coherence of, and learning curve for, Google’s platform are key aspects to monitor in the coming year.
Responsible AI vision and capabilities: Google has taken a clear thought leadership position in the area of AI explainability and responsibility. Google shares and productizes its learnings on these subjects through responsible AI practices, fairness best practices, technical references and other materials.
Research contributions and impact: Google’s leadership in AI research includes the prominent work of Google Research, Google Brain and DeepMind, as well as ongoing significant contributions to scholarship, open-source projects and communities — TensorFlow, Kubernetes/Kubeflow and Kaggle stand out.
Consolidation, cohesion and simplification: Google has made a significant effort to reorganize and redesign not just its DSML platform, but also the way it releases software. The unified AI Platform will seek to address past issues of coherence, interoperability and ease of use. Google has also introduced simplified New Product Introduction (NPI) stages to provide more predictability and transparency about launch timelines.
Transition of portfolio: Google is developing capabilities for data science professionals at a rapid pace. This means a period of transition and learning for the market in general and adopters of its unified AI Platform in particular. Google’s new product release standards and timelines will be put to the test in 2021.
Steepness of learning curve: Although Google has made improvements in terms of accessibility and augmentation, its platform presents a steep learning curve and requires technical expertise. Supplementary tools for citizen data scientists and developers new to ML may be necessary.
Maturing on-premises, hybrid and multicloud support: The majority of Cloud AI Platform customers operate in purely cloud environments. Some capabilities of the Cloud AI Platform change and may become more complicated in hybrid, multicloud or on-premises environments. Multicloud support is evolving, and today most customers manage data, models and ML workloads within Google Cloud. New services like BigQuery Omni for viewing data across clouds are indicative of Google’s next steps in the multicloud field.
H2O.ai is a Visionary in this Best Cloud Provider. H2O Driverless AI is this vendor’s commercial product, for which there are additional modules such as MLOps and AutoDoc. H2O.ai also offers open-source products with optional enterprise support, such as the H2O 3 platform and AutoML for ML, Sparkling Water for Spark integration, and Wave for app development. H2O Driverless AI can be extended and customized with open-source or custom-made “recipes.”
H2O.ai is geographically diversified. About one-third of its customers can be found in the financial services sector. Other industries are represented more or less equally among the company’s client base.
H2O.ai’s roadmap and innovation earned it the highest overall score for Completeness of Vision. H2O.ai is a thought leader in the automation and augmentation of DSML, including time-series analysis.
Vision for value creation: In addition to its vision for democratizing AI through automation and augmentation, H2O.ai has extended its offering with Wave, an open-source product for building AI apps. Wave appeals to the corporate developer community and integrates with H2O AI Hybrid Cloud, Driverless AI and other components. This reflects a strong vision for streamlining value creation with AI, and this vision is further highlighted by H2O.ai’s contributions to AI for Good and investments in responsible AI capabilities.
Extensive augmentation (automation): H2O Driverless AI eases the adoption of DSML by offering augmentation in multiple areas: in addition to augmented feature engineering, model selection and parameter tuning, the company stands out for its sophisticated automation of time-series modeling. In the past year, H2O.ai has invested significantly in augmentation and automation for innovative natural language and image processing.
Rich XAI: H2O.ai offers multiple explainability capabilities throughout the ML life cycle, not just for modeling but also for feature engineering. Supported methods include K-LIME, LIME-SUP, Shapley, decision tree surrogates, causal graphs, NLP explainability and more.
Lack of certain data access and preparation capabilities: H2O.ai has room for improvement in terms of data access and aspects of data preparation. These include data refresh, data lineage, access governance, metadata management and data catalogs.
OEM partner strategy: H2O Driverless AI’s capabilities for augmentation and automation depend on OEM partnerships with other DSML vendors. If those vendors’ platforms outperform H2O’s in terms of capabilities for data preparation, for example, then potential customers may be less inclined to select H2O.ai’s.
Collaboration and cohesion: Expert data scientists, citizen data scientists, developers and other personas may all use different products from H2O’s growing portfolio of commercial and open-source products and modules. Despite shared projects and a common recipe catalog, the platform could benefit from more attention to cross-product, multipersona collaboration and a more cohesive portfolio structure.
IBM is a Leader in this Best Cloud Provider. Its core product for this evaluation is IBM Watson Studio on IBM Cloud Pak for Data, a modular, open and extensible platform for data and AI that combines a broad set of descriptive, diagnostic, predictive and prescriptive capabilities
IBM is geographically diversified, and its client base spans many industries and various business functions.
Revamping its offering has taken several years, and competition will remain fierce for IBM. Still, IBM now delivers a modern and comprehensive solution that draws on its roots in SPSS, ILOG CPLEX Optimization Studio and earlier products, and that benefits from a stream of innovations from IBM Research. These reflect a well-rounded vision.
Multipersona support: IBM Watson Studio offers a visual workflow interface or “graphic canvas,” as well as a choice of notebooks, thus enabling data engineers, expert data scientists and citizen data scientists to work together on the same project. ML pipeline activities, from data acquisition to operations, are supported by AutoAI and collaboration, including a catalog for sharing and reusing (meta)data and models.
Composite AI vision: The modular structure of the IBM Watson Studio platform contains, or can be extended by, multiple components for decision augmentation or automation. These components include several ML and other AI frameworks, optimization features, spatio-temporal and graph analytics, natural language features and video/image/audio analysis (in batch or streaming mode). In addition, by including IBM Decision Optimization, the platform supports decision modeling and decision management or rules processing.
Comprehensive attention to responsible AI and governance: IBM offers extensive support for explainability, bias, fairness, accuracy and drift monitoring, synthetic data and differential privacy. Its platform also provides strong governance (and optional risk management) support, with lineage, policies and rules in its catalog, as well as adversarial security.
AutoAI scope: IBM Watson Studio offers automation and augmentation of multiple activities in the ML pipeline, including data selection, imputation, visualization, feature transformation and modeling. However, a few competitors also augment time-series analysis by, for example, using recurrent neural networks and long short-term memory models.
Brand restoration: With its improved Watson Studio, IBM has caught up with and, in some cases, even surpassed its competitors. Nevertheless, data and analytics leaders may still find their ML experts skeptical about the innovativeness of Watson Studio and IBM’s ability to keep pace in a dynamic and competitive market.
Product-bundling clarity: Although the cohesion of the modular Watson Studio on IBM Cloud Pak for Data has improved, there remains confusion among potential customers as to which products and licenses are needed for which configurations. This increases concerns about licensing costs.
KNIME is a Visionary in this Best Cloud Provider. Its open-source offering, the KNIME Analytics Platform, focuses on the authoring of DSML workflows and projects. A commercial product, KNIME Server, focuses on automation, deployment and orchestration capabilities.
KNIME is globally diversified, with a strong presence in Europe and the U.S. Its client base spans all industries and company sizes.
KNIME continues to evolve and to develop its vision for bridging the gap between development and production and offering new ways for data scientists and end users to collaborate.
Breadth and depth of DSML capabilities: KNIME has been incrementally building its product for over 10 years, and this shows in the wide range of capabilities provided by the platform. It has almost 4,000 nodes for connecting to different types of data source, transforming and preparing data, ML and other advanced techniques. Very few DSML tasks are not supported by KNIME’s platform.
Commitment to open-source platform: The KNIME Analytics Platform is not a limited or restricted version of a full product. Most of the library of components are available for use in the platform at no cost. This provides an ideal way to experiment with DSML projects — to test and learn — without upfront investment in a particular technology. Scalability can then be achieved through use of the KNIME Server product.
Coherence of visual workflow: The basic building blocks within the KNIME Analytics Platform are nodes, components and workflows. Everything within the platform, including AutoML, data visualization, interactive apps and deployment models, is built using these blocks and can be broken down into individual components and nodes, with associated metadata, for full transparency.
Limited customer support for enterprise deployments: KNIME has not expanded as aggressively as other DSML vendors. Although there is an active community answering questions about functionality, enterprise deployments typically require specialist services to increase adoption and ensure the product meets expectations. KNIME relies on partners to deliver these services.
Vision for responsible AI: KNIME provides a plethora of components for XAI, such as SHapley Additive exPlanations (SHAP), Partial Dependence Pre-processing (PDP) and Individual Conditional Expectation (ICE). However, frameworks, guidance, best practices and research that can be applied by all disciplines within a data science team are lacking.
Low market traction and sales innovation: The visibility of KNIME to DSML platform buyers remains low. Prospective customers may therefore shortlist newer vendors with more prominent sales and marketing campaigns.
MathWorks is a Leader in this Best Cloud Provider. Its two major products are MATLAB and Simulink, but only MATLAB met the inclusion criteria for this Best Cloud Provider.
MathWorks is geographically diversified. Its clients are primarily engineering and asset-centric organizations.
MathWorks demonstrates a clear vision and thought leadership in asset-centric industries. Its innovations are applied, at scale, for large use cases intended to solve real-world problems. MathWorks is one of the few vendors in the DSML market that can handle large, distributed, real-time IoT implementations with a continuous environment from the edge to the cloud, and from development to simulation and operationalization and back.
Robust composite AI capabilities: MATLAB is among the most advanced DSML platforms for developing, integrating and deploying ensembles of AI techniques within a single solution (an approach that Applant calls composite AI). MathWorks combines these techniques in a flexible infrastructure that supports largely distributed environments, from the edge to the data center and the cloud.
Integrated domain knowledge: MathWorks benefits from deep domain expertise, which it integrates into its DSML platform. From predictive maintenance to fleet analytics, manufacturing process analytics and risk management, the company handles domain-specific idiosyncrasies within its platform, while developing technologies and application-specific toolboxes.
Verifiable and reliable ML: Safety is typically critical in the asset-centric domains in which MathWorks is active — they have no tolerance for unreliable operations. Beyond interpretability, MathWorks enables engineers to interact with models through either web applications or simulation environments.
Interface democratization: MATLAB remains the preserve of data-science-initiated engineers and scientists, who essentially use notebooks to develop models. To widen the appeal of its powerful platform to citizen data scientists and business and operations specialists, MathWorks will have to modernize its UI and provide visual development features.
Interpretable AI: MathWorks remains behind many of its competitors, especially those in the Leaders Cloud, when it comes to model interpretability and fairness management. Even its asset-centric audience will soon require better capabilities, so the company will have to start focusing on this issue.
Augmented DSML capabilities: Despite progress in 2020, MathWorks remains behind many of its competitors when it comes to expanding its augmented DSML functions, particularly for feature engineering and deployment optimization.
Microsoft is a Visionary in this Best Cloud Provider. The core product considered in this Best Cloud Provider is Azure Machine Learning (Azure ML). The supporting portfolio of products for Azure ML includes Azure Data Factory, Azure Data Catalog, Azure HDInsight, Azure Databricks, Azure DevOps, Power BI and other components.
Microsoft is geographically diversified, and its client base spans many industries and various business functions.
Microsoft earns the highest Ability to Execute score of the large cloud providers. Microsoft has a strong combination of vision and tailored functionality for the full spectrum of data science professionals who contribute to multifunctional teams.
Azure stack support for enterprise DSML: Azure ML and its supporting portfolio offer strong capabilities for the needs of enterprise data science. MLOps capabilities include a registry of packages and models and support for streamlined creation of reproducible ML pipelines. Azure ML comes with differentiated security and governance capabilities and, combined with Azure Cloud management services, supports compute quota and cost management capabilities.
Multipersona vision and offering: Microsoft’s vision and current offering for multipersona data science is stronger than those of its closest competitors. Azure ML provides augmented DSML and a drag-and-drop designer for citizen data scientists, and flexible notebook and SDK options for expert data scientists. Microsoft’s suite of ancillary products provides a strong environment in which data engineers, ML engineers and architects, corporate developers and others can contribute to the DSML workflow.
Openness and partnerships: Microsoft goes beyond widespread support for popular OSS by investing in and contributing to a number of prominent projects (such as, Open Neural Network Exchange [ONNX], InterpretML and MLflow). The Azure Databricks product and partnership has been successful for both partners. Azure will also be the preferred cloud provider for the SAS Viya platform, which will be integrated with Azure and Azure ML services.
Requirement for Azure services commitment and expertise: Azure ML relies on a variety of Azure services and modules and can work with data from any source. Azure ML customers typically use Azure Data Factory for integration and transformation, Azure Data Catalog for governance, and any DevOps system (often Azure DevOps or GitHub) for integration within web services and other services. Supporting this portfolio requires significant technical expertise and understanding of the Azure ecosystem.
Evolving on-premises, hybrid and multicloud support: The majority of Azure ML customers operate in purely cloud environments. Some capabilities within the Azure ML portfolio change or become more complicated in hybrid, multicloud or on-premises environments. Microsoft’s multicloud support is evolving, however, and most of its customers manage data, models and ML workloads within Azure.
Augmented DSML: Microsoft delivers solid support for citizen data scientists and likely lower total cost of ownership (TCO) for augmentation capabilities, but still has room to improve, compared with vendors that focus solely on data science. Organizations seeking to broaden their data science talent base need to understand how much augmentation is offered by the visual designer in Azure ML, as opposed to the SDKs and Power BI.
RapidMiner is a Visionary in this Best Cloud Provider. RapidMiner Studio is the vendor’s primary model development tool and is available as both a free edition and a commercial edition. For the enterprise, offerings can be extended through the RapidMiner AI Hub, which includes collaboration and governance capabilities, as well as RapidMiner Go and RapidMiner Notebooks, which are model development experiences for novices and coders respectively. Turbo Prep, Auto Model and Automated Model Ops are augmented features of the platform, while the RapidMiner AI Cloud offers flexible, cloud-based deployment options.
RapidMiner is geographically diversified and has a strong presence in many industries, but especially manufacturing, life sciences, banking, insurance, energy, business services, government and education.
RapidMiner’s latest capabilities and roadmap exemplify key market trends, such as multipersona collaboration, XAI and model governance.
Multipersona collaboration: RapidMiner makes it easy for expert data scientists and citizen data scientists to work on its platform collaboratively and to manage end-to-end data science pipelines. The vendor offers a certification program through its RapidMiner Academy to help those who are not data scientists understand the product, model development operationalization and governance.
Clear vision and delivery of aligned features: RapidMiner has made significant changes to its product portfolio during the past year. Particularly strong new capabilities are FeatureMart and Feature Catalog, which enable users to perform automated feature engineering, and share and store features across an organization, thus enhancing reusability and reproducibility.
Explainable, governed and secured AI: RapidMiner provides features that enable users to explain and govern their models in development and production, thus giving them greater transparency and more control over insights. Additionally, features such as single sign-on and strong identity and access management capabilities help secure the AI pipeline.
Growth rate and outreach: RapidMiner has grown slowly, relative to other vendors with comparable value propositions and the overall market. Although RapidMiner’s retention rate remains competitive, existing and prospective customers should check that RapidMiner continues to match the relentless pace of innovation in this market.
Market-standard advanced analytics capabilities: RapidMiner has market-standard functionalities built in for use cases involving reinforcement learning, generative adversarial networks, small data ML, geospatial analytics and agent-based modeling.
Perception as an academic platform: RapidMiner’s strong presence in the academic world continues to cultivate a large user community, with many young and talented data scientists attracted to the free version of its platform. Prospective enterprise customers should not overlook RapidMiner or dismiss its capabilities as an enterprise platform provider in favor of newer vendors that market products solely to enterprises.
Samsung SDS is a Niche Player in this Best Cloud Provider. Brightics AI is the end-to-end analytics and data science platform evaluated for this Best Cloud Provider. Samsung SDS offers Brightics Standard and Enterprise editions and an open-source tool, Brightics Studio. The Standard edition is a lightweight version of the Enterprise edition, with support for only Python. The Enterprise edition offers support for Python and Spark and enables distributed processing of ML workloads.
Samsung SDS has global operations. Its customer base is concentrated in Asia, especially in the manufacturing and financial services industries.
Brightics AI is an easy-to-use platform for both experts and citizen data scientists. Its focus on data management also enables other roles, such as data engineers and industrial users, to work with it.
Comprehensive ecosystem vision: The Brightics AI platform represents one of Samsung SDS’s five key technology areas — AI, blockchain, cloud, data analytics and security (ABCDS) — which comprise its Digital Transformation Framework. The vendor aims to provide a holistic solution by complementing Brightics AI with other Samsung SDS offerings, such as Samsung Cloud and Brightics IoT.
Data capabilities: Samsung SDS helps clients achieve more value through its focus on the data life cycle. Data access, preparation and visualization are strengths of the Brightics AI platform. It offers good support for semistructured and unstructured data, with capabilities like automatic data labeling and an automatic schema builder.
Ease of use and collaboration: Samsung SDS’s platform provides an intuitive, easy-to-use interface for both coders and noncoders. It supports multipersona collaboration through a wizard for data scientists, apps for business users and APIs for application developers. It provides a container-based personal sandbox environment for each user that allocates server resources for experimentation.
Need for expansion into new markets: Although Samsung SDS has offices and global delivery centers in many countries, the clients it serves currently are mostly in Asia. Prospective customers should closely examine whether Brightics AI would be a good choice for the parts of the world where they operate.
Gaps in product vision: Brightics AI is behind competing offerings in areas like composite AI and decision intelligence. Although Samsung SDS has increased its focus in areas such as collaboration and augmented DSML, current capabilities and planned innovations need strengthening to respond to the demands of a rapidly evolving market.
Limited support for ModelOps and explainability: Brightics AI lacks key capabilities like A/B testing, rollback automation and certain model telemetry features. It currently offers low model explainability, and support for certain key features such as Local Interpretable Model-Agnostic Explanations (LIME) and SHAP is still on the roadmap. Given the market’s focus on robust operationalization of ML models and XAI capabilities, prospective customers should look for clear improvements in these areas.
SAS is a Leader in this Best Cloud Provider. SAS Visual Data Mining and Machine Learning (VDMML) is the core product evaluated for this Best Cloud Provider. As part of the SAS Viya portfolio, VDMML is included in various product bundles on SAS Viya, namely SAS Visual Machine Learning, SAS Visual Data Science, SAS Data Science Programming and SAS Visual Data Decisioning.
SAS is geographically diversified, and its client base spans many industries and various business functions.
SAS is the longest-standing Leader in this Best Cloud Provider. It maintains a strong and adaptive position, given its keen understanding of the market and its thought leadership in key areas such as composite AI, MLOps and decision intelligence. The company recently announced a partnership with Microsoft to support closer integration with Azure.
Market understanding and presence: SAS’s long standing and experience in this market have earned customers’ trust. SAS offers enterprise-grade platform capabilities and support, coupled with a robust vision for key market trends, including composite AI, decision intelligence and MLOps. The domain expertise embedded in its products and consulting services enable customers to derive value from the entire analytics life cycle.
Cloud-native architecture and open-source integration: The latest release of SAS Viya offers a fully cloud native approach. SAS customers can now leverage all Viya capabilities in a flexible container-based architecture that runs in the cloud. SAS offers innate integrations with popular open-source tools and languages for data, modeling and model management.
Automated feature engineering and modeling: SAS provides differentiated automated feature engineering and automated modeling capabilities through automated pipeline generation. Experimentation is supported through utilities such as the Data Science Pilot Action Set and other modules. For automated hyperparameter tuning, Model Composer uses a patented hybrid search strategy.
Perceived high costs: SAS’s pricing remains a concern for many customers, who therefore investigate less-costly alternatives. VDMML has historically been priced by the core, but a new pricing model has eliminated the core capacity restriction and pricing is now based on type of user. SAS customers should work with the vendor to determine whether the new pricing model is more suitable for their requirements.
Product bundling: Although SAS has streamlined its product portfolio, with more “fit for use” product bundling replacing “a la carte” selection, SAS Viya’s full suite of products and add-ons remains complex for users to navigate. However, to make navigation easier, SAS VDMML is now part of product bundles that offer programming-only interfaces, as well as bundles that offer both programming and visual, drag-and-drop interfaces.
Marketing strategy: SAS needs to work on the perception of its product portfolio. Despite clear modernization, SAS is still frequently perceived as a vendor of legacy software and traditional advanced analytics. Small and midsize companies should explore case studies from customers in similar segments to understand current usage of SAS products.
TIBCO Software is a Leader in this Best Cloud Provider. After tightly stitching together various data and analytics software and platforms, TIBCO is fulfilling its “Connected Intelligence” vision. That vision is embodied at its core in TIBCO’s Data Science platform, along with TIBCO Spotfire and TIBCO Streaming and a robust data and process infrastructure.
TIBCO is geographically diversified and present in many industries. but has a stronger presence in asset-centric industries, given its science and engineering focus, especially on edge computing.
The company’s origins in the middleware sector give TIBCO an edge when it comes to model deployment and production, in any environment, centralized or distributed, across a wide variety of use cases.
Leading-edge DSML capabilities: From innovations like dynamic learning on event streams to integration with popular edge platforms like those of Microsoft and AWS, TIBCO delivers leading IoT capabilities on its TIBCO Data Science platform. Through its TIBCO LABS program, the company has launched initiatives like Project Air, streamlining IoT solutions from the edge to the cloud.
Hyperconvergence and integration: TIBCO extends its Data Science platform from both an infrastructure perspective (in relation to edge analytics, for example) and from an analytical angle through its business intelligence (BI) and strong visualization capabilities. TIBCO has a leading vision for the colliding worlds of data science and analytics.
Support for collaboration and applied analytics: TIBCO is a strong choice for analytical teams that span a wide range of functions across an organization. That strength extends beyond the integrated technical environment where analytical assets can be shared to the capture of domain expertise — the results of collaboration with subject matter experts can then be embedded within integrated applications.
End-to-end ModelOps capabilities: TIBCO has made important progress toward achieving robust ModelOps capabilities with functions such as those of TIBCO Artifact Management Server and improvements to its ML pipeline capabilities. But it needs to provide a more comprehensive and approachable ModelOps capability in order to manage the full life cycle of AI models.
Citizen data science support: Despite the capacity of TIBCO’s portfolio to unify technical and operational talents, the Data Science platform still needs a more approachable interface for citizen data scientists. The canvas interface, combined with simplified AutoML functions, forms a solid base, but TIBCO Data Science is still aimed at data scientists with significant ML experience — support for citizen data science continues to rely on other parts of the portfolio.
Financial growth in 2020: Like many organizations, TIBCO had a challenging 2020 in terms of licensing revenue, while the difficult economic conditions impacted its subscription business. The company will need to make strong and continuous development investments in this fast-moving market in order to stay ahead of well-funded competitors. Before investing in TIBCO’s technology, organizations should compare their requirements with the vendor’s technology roadmap.
Vendors Added and Dropped
We review and adjust our inclusion criteria for Best Cloud Providers as markets change. As a result of these adjustments, the mix of vendors in any Best Cloud Provider may change over time. A vendor's appearance in a Best Cloud Provider one year and not the next does not necessarily indicate that we have changed our opinion of that vendor. It may be a reflection of a change in the market and, therefore, changed evaluation criteria, or of a change of focus by that vendor.
Amazon Web Services
Inclusion and Exclusion Criteria
Best Cloud Providers identify and analyze the most relevant providers in a market. By default, an upper limit of 20 vendors is imposed to enable identification of the most relevant providers. On some specific occasions, however, this upper limit may be raised when the Best Cloud Provider’s value to clients would otherwise be diminished.
The inclusion criteria represent the specific attributes necessary for inclusion in this Best Cloud Provider. They were applied progressively, in sequence and in cumulative fashion to aid identification of the most relevant providers.
Inclusion Criterion 1: Data Science and Machine Learning Platform
As noted earlier in the Market Definition/Description, a vendor`s DSML platform had to:
Offer a mixture of the basic and advanced functionality essential for building DSML solutions (primarily predictive and prescriptive models).
Support the incorporation of these solutions into business processes, surrounding infrastructure, products and applications.
Support the sustainable consumption of insights derived from the platform and offer functionality to quantify and track the value of data science projects.
Support variously skilled data science professionals (“data scientist” is an inconsistently applied job title and professional distinction — a DSML platform’s user base is often made up of professionals with diverse technical and business backgrounds).
Support multiple tasks across the data science life cycle, including:
Problem and business context understanding
Model creation and training
Data and model governance
Explainable AI (XAI)
Business value tracking
Additionally, a vendor had to be able to provide technical support for its DSML platform directly and/or via commercial support partners.
Inclusion Criterion 2: Revenue and Growth
A vendor’s core product had to offer one or more common license models:
Perpetual license model
SaaS subscription model
Consumption-based model or other type of model