The Cloudera and Hortonworks technology data management platform is coming in, especially the datawarehouse and machine learning service provided as cloud services, available on AWS.
A year after acquiring Hortonworks, Cloudera launches the integrated data platform on which the two teams have worked together since the beginning of the year. In March, the publisher had the opportunity to present in outline this CDP, Cloudera Data Platform, and the first integrations realized between the two offers, during the event DataWorks Summit 2019, in Barcelona. The main new features are data warehouse and machine learning services designed for the Cloud and Data Hub to migrate on-premise workloads to the cloud. As expected, the platform brings multi-functional capabilities to the proposed “as a service” analytics workloads across a wide range of use cases, including AI-based and back-end applications. (at the edge). The CDP provides 5 sets of services for data mining: flow and streaming management tools, data engineering tools, a native cloud data warehouse, an operational database and a native machine learning service cloud.
On the protection and governance of data, the CDP integrates SDX technologies, shared data experience, which must allow to create a secure data lake, regardless of the cloud used. The implementation of these functions is done “in a few hours rather than in terms of days or weeks,” said Arun C. Murthy, chief product officer of Cloudera. In multicloud time, the available on-premise and hybrid platform can be deployed on different clouds, with shared metadata and security and governance functions. CDP’s Data Warehouse, Machine Learning and Data Hub services are available on the AWS public cloud.
The metadata and security tools of the CDP are shared across all services. Among the users of the CDP, the pharmaceutical group GlaxoSmithKline intends to use the SDX capabilities to manage its metadata and governance information centrally.
Automatic scaling of the datawarehouse
On the datawarehouse part, which exploits open source engines like Impala, Hive LLap, Hive on Tez and tools such as Hue or Workload XM, Cloudera highlights the capabilities of automatic scaling of the service to accommodate hundreds of additional users, as well as the simultaneous operation of the different analytical services by sharing metadata and SDX functions. On-premise workloads for reporting, dashboards, ad hoc queries, or advanced analytics can be moved to the cloud and accessed by self-service. “Hundreds of users can provision their own resources on one click and analyze the data at the same time, whether on-premise or in the cloud,” says Anupan Singh, general manager for the datawarehouse at Cloudera.
The cloud data warehouse is integrated with machine learning tools and the Data Hub service. The latter allows users to migrate data management on premise and analytic workloads to the cloud where they can implement new workloads through a choice of cluster types, workloads, infrastructures, options configuration and customization. With this offer, Cloudera will compete on the market of a pure player of the cloud-based data warehouse, Snowflake, which has in recent years conquered customers at an accelerated pace with its high capacity for scaling and speed processing on large volumes based on a database in columns. He will also find in front of him Google Big Query,
Datawarehouse and ML services billed by the hour
On the Machine Learning part, the CDP enables data scientists and developers to deploy workspaces for projects in just a few clicks, data scientists can use the tools of their choice and replicate data sets in a hybrid environment, again in now security and governance controls. CDP provides a user experience encompassing data processing, model training, experiment monitoring, and production model deployment and management. Added to this is the ability to bring data into different environments “without creating disconnected silos” and without changing the experience of data scientists to create end-to-end ML processes, says Cloudera. Competition is not lacking either on the ground of the machine learning with machine learning cloud offerings from Amazon Web Services, Google, Microsoft Azure and IBM to name just the main ones. Cloudera will highlight the benefits of an integrated platform to deploy in a hybrid environment.
The Data Warehouse, Machine Learning, and Data Hub cloud services are available on the AWS public cloud and billed per hour for different instance types and different CPU, GPU, and RAM options. An on-premise option, CDP Data Center, is currently available in advance for a limited number of Cloudera customers. It will be delivered worldwide by the end of the year and sold as an annual subscription starting at $ 10,000 per node. Further pricing details are provided on the publisher’s website.