Public Cloud AI Review - Public Cloud Review

The objective of this post is to analyze the Artificial Intelligent (AI) Services offered by the four main providers of public cloud; AWS, GCP, Azure and Alibaba. AI is one of the most promising services in the cloud that is growing with difference compared to other services. AI benefits especially from the cloud due to the calculation and storage power in the training processes and the need for power peaks during its production startup.

To be fair, I should include IBM in the comparison, but since the blog in the previous comparisons I have only included Amazon, Google, Azure and Alibaba (really public clouds) I will not include it, although in the summary I will add some considerations.

Table of Contents

AI Services

Unlike other services, I have been able to find a blueprint in a cloud provider that reflects the structure of the AI services. The blueprint I’m going to use to compare AI services among cloud providers are based on Google’s vision and have the following layers:

AI Solutions
AI Predefined Trained Model Services
AI Predefined Untrained Model Services
AI Machine Learning Foundation Platform
AI Frameworks
AI Hardware Accelerators

Let’s see bottom up the layers;

AI Hardware Accelerators

AI HW accelerators are Application-specific integrated circuit (ASIC) to speed up mainly the training of AI models. Up to date there are three relevant AI HW Accelerators:

GPUs (Graphical Processing Units)
Created by NVIDIA it was originally designed to improve the power of graphics applications that need high computing power in parallel. However, the algorithm to training Supervised Machine Learning models is usually based on gradient descent that requires multiplying very large matrices of numbers. This can be done in parallel since the order you do that does not matter. Therefore the GPUs can also be applied to training ML models.
TPUs (Tensor Processing Units)
Created by Google to speed up Tensorflow. TPU enables you to run your machine learning workloads on Google’s TPU accelerator hardware using TensorFlow.
TPU’s allows to training up to 17 times faster than GPUs with a saving of up to 38% of cost. Unfortunately Google’s TPUs are proprietary and are not commercially available.
FPGA (Field-Programmable Gate Array)
FPGA is an integrated circuit designed to be configured by a customer or a designer after manufacturing. You can reconfigure FPGAs for different types of machine learning models. This flexibility makes it easier to accelerate the applications based on the most optimal numerical precision and memory model being used.

AI Frameworks

All the Cloud providers use AI Frameworks to offer his services. The framework will simplify the effort to create a Machine Learning model like:

Implement multiple Algorithms and Model structures
Computational graph abstraction
Helps the Train and Evaluate the Model
Integrate with multiple data sources
Deploy in multiple machines and platforms

The main Frameworks used by the cloud Providers are:

End to end ML frameworks

Tensorflow
Open source library for Machine Learning and high-performance numerical computation originally developed by Google.
Language: C++ or Python
PyTorch
Open Source Machine Learning library inspired by Torch. It has primarily been developed by Facebook‘s artificial intelligence research group.
Language: C++ or Python
Apache MXNet
Open Source Machine Learning software framework, used to train, and deploy deep neural networks.
Language: C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram
Caffe
Caffe is a deep learning framework made with expression, speed, and modularity in mind. Caffe supports many different types of deep learning architectures geared towards image classification and image segmentation. It supports CNN, RCNN, LSTM and fully connected neural network designs. Caffe supports GPU- and CPU-based acceleration computational kernel libraries such as NVIDIA cuDNN and Intel MKL
Chainer
Open Source flexible Python-based framework for neural networks. It was developed by Preferred Networks, a startup based in Japan. This framework allows writing complex architectures simply and intuitively.
Theano
Open Source software developed by the Montreal Institute for Learning Algorithms at the University of Montreal. It is a python library and optimizing compiler for manipulating and evaluating Tensor operations.
Microsoft Cognitive Toolkit (CNTK)
Open Source toolkit for commercial-grade distributed deep learning. It describes neural networks as a series of computational steps via a directed graph.
Language: C++, C# or Python

Specific ML services Framework

Scikit-learn
Scikit-learn provides a wide selection of supervised and unsupervised learning algorithms.
ONNX
Open Neural Network Exchange Format (ONNYX) is a new standard for exchanging deep learning models. It promises to make deep learning models portable thus preventing vendor lock in. Currently there is native support in ONNX for PyTorch, CNTK, MXNet, and Caffe2 but there are also converters for TensorFlow and CoreML.
Keras
A high-level API to build and train deep learning models. Written in Python and capable of running on top of TensorFlow, CNTK, or Theano.
Horovod
Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet.
Gluon
Gluon is an open source deep learning library jointly created by AWS and Microsoft (to position against Google AI war) that helps developers build, train and deploy machine learning models in the cloud. Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. At the moment, Gluon provides support for Apache MXNet and CNTK
JupyterLab
A web-based interactive development environment for working with notebooks, code and data. JupyterLab has full support for Jupyter notebooks and enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning
Horovod
Open source distributed deep learning framework for TensorFlow by Uber. It is developed and supposed to be easy to develop distributed deep learning projects and speed them up with TensorFlow.

As we will see, Tensorflow is the most extended framework in cloud providers.

AI Machine Learning Foundation Platform

All cloud providers offer a Machine Learning platform based on a set of the frameworks described above.

The basic aim of the AI ML Foundation Platform is to simplify and accelerate the deployment of a ML Solution based on a set of framework covering the following tasks:

Prepare Data
Build and define a Model
Training
Deployment
Predict

In addition, all Cloud providers offer built-in ML Algorithms that help you train models for a variety of use cases:

Two-class classification
Multi-class classification
Regression
Clustering
Ranking
Anomaly detection

AI Predefined Untrained Model Services

The future of AI services in Cloud is the offer of trained or untrained models ready to perform a specific task.

The AI Predefined Untrained Model Services are AI Models (algorithms and structure) ready to perform a specific task but need to be trained (with your training data) to tailor your business needs.
The objective is enabler developers with limited machine learning expertise to train high-quality models specific to their business needs.

The AI Predefined Untrained Model Services can be grouped in:

Sight
- Vision (Image classification)
- Video (Video classification)
Language
- Natural Language (Reveal the structure and meaning of text)
- Translation (translate between languages)
Data Forecast (provide predictive insight of structure data)

AI Predefined Trained Model Services

The Predefined Trained Model Services are AI trained Models ready to perform a specific task. This service is offered through a Rest API or SDK ready to use without any training.

The AI Predefined Trained Model Services can be grouped in:

Sight
- Vision (Image classification, OCR, Faces detection, explicit content, ..)
- Video (Video classification, OCR, track objects, explicit content, .. )
Language
- Natural Language (Reveal the structure and meaning of text, Sentiment analysis, Syntax analysis, …)
- Translation (translate between languages, language detection,…)
Conversation
- Speech-to-Text
- Text-to-Speech
- Natural Conversation
- Speaker recognition
Data Forecast
- Time Series dataset correlations & abnormally detection
- Recommendations/personalization
- Data Warehouse ML

But these are the first AI services offered by cloud providers that in the near future will increase the offer.

AI Solutions

AI Solutions are prepackaged business solutions based on AI. Again this is a high growth area and relies on the services of the lower AI layers.

The target business processes for AI Solutions are:

Intelligent Contact Center
CRM
Human Resource and Intelligent Recruitment
Document digitalization
Data Analytics
Cybersecurity
Sales/revenue forecasting
Digital Marketing
Personalization and recommendation
Gaming
Robotics & smart cars (with 5G)
Financial Trading
Healthcare
Logistics and Delivery
And more…..

The cloud providers has started to offers AI Solutions to increase the penetration at business level.

AWS AI Services

AI Hardware Accelerators

Amazon offers two kinds of HW accelerators:

GPUs (Graphical Processing Units)
NVIDIA Tesla V100 (up to 8 GPUs) , K80 (up to 16 GPUs) and M60 (up to 4 GPUs) GPUs
AWS also offer this service under AWS Inferentia and Amazon Elastic Inference
FPGA (Field-Programmable Gate Array)
Amazon EC2 F1 instances (Virtex UltraScale+) use FPGAs to enable delivery of custom hardware accelerations with a development Kit. Up to 8 FGPAs (with 64 vCPUs) and 976 GIB memory.

AI Frameworks

Amazon manage or integrate the following frameworks out of the box (AWS Deep Learning AMIs):

TensorFlow
PyTorch
Apache MXNet
Chainer
Microsoft Cognitive Toolkit
Gluon
Keras
Horovod
Theano
JupyterLab

AI Machine Learning Foundation Platform

Amazon offers Amazon SageMaker to Build, train, and deploy machine learning models fast that supports TensorFlow and Apache MXNet out-of-the-box.

The overall picture of Amazon SageMaker is the following

Prepare Data (cover by the Data Products of Amazon)
- Ingestions; Amazon S3 & Transfer Service
- Data preparation & preprocess; XGBoost Algorithm & AWS Marketplace solutions
Build and define a Model (cover by SageMaker)
- Automated data labeling with Amazon SageMaker Ground Truth
- AWS Deep Learning AMIs
- AWS Deep Learning Containers
- Amazon SageMaker Jupyter notebook
- Amazon SageMaker Built-in Algorithms
- Amazon SageMaker RL
Training, Test & Analyze (cover by SageMaker)
- Amazon Training Job
- Amazon SageMaker automatic model tuning
- Amazon SageMaker Neo
Deployment & Predict (cover by SageMaker)
- HTTPS endpoint
- Batch Transform
- Inference Pipeline

Let’s see the main Amazon Sagemaker products & services:

Data Labeling Service: Amazon SageMaker Ground Truth

Amazon SageMaker Ground Truth helps you build highly accurate training datasets for machine learning quickly. SageMaker Ground Truth offers easy access to public and private human labelers and provides them with built-in workflows and interfaces for common labeling tasks. Additionally, SageMaker Ground Truth can lower your labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently.

AWS Deep Learning AMIs

AWS Deep Learning AMIs provide machine learning practitioners and researchers with the infrastructure and tools to accelerate deep learning in the cloud, at any scale. You can quickly launch Amazon EC2 instances pre-installed with popular deep learning frameworks and interfaces such as TensorFlow, PyTorch, Apache MXNet, Chainer, Gluon, Horovod, and Keras to train sophisticated, custom AI models, experiment with new algorithms, or to learn new skills and techniques.

AWS Deep Learning Containers

AWS Deep Learning Containers (AWS DL Containers) are Docker images pre-installed with deep learning frameworks to make it easy to deploy custom machine learning (ML) environments quickly by letting you skip the complicated process of building and optimizing your environments from scratch.

Amazon SageMaker notebook

An Amazon SageMaker notebook instance is a fully managed ML compute instance running the Jupyter Notebook App. Amazon SageMaker manages creating the instance and related resources. Use Jupyter notebooks in your notebook instance to prepare and process data, write code to train models, deploy models to Amazon SageMaker hosting, and test or validate your models. Fully-managed Hosting with Auto Scaling

Amazon SageMaker Built-in Algorithms

Amazon SageMaker provides several built-in machine learning algorithms that you can use for a variety of problem types.

Amazon SageMaker RL

Amazon SageMaker RL supports reinforcement learning in addition to traditional supervised and unsupervised learning. SageMaker now has built-in, fully-managed reinforcement learning algorithms

Amazon Training Job

With a Training Job you can train a model in Amazon SageMaker

Amazon SageMaker automatic model tuning

Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose.

Amazon SageMaker Neo

Amazon SageMaker Neo enables developers to train machine learning models once and run them anywhere in the cloud and at the edge. Amazon SageMaker Neo optimizes models to run up to twice as fast, with less than a tenth of the memory footprint, with no loss in accuracy.

HTTPS endpoint

Amazon SageMaker provides an HTTPS endpoint where your machine learning model is available to provide inferences.

Batch Transform

Use Batch Transform to get inferences for an entire dataset

Inference Pipeline

You use an inference pipeline to define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers

AI Predefined Untrained Model Services

Amazon has decided to offer only Pre-trained Model Services. If you want an untrained model you need to work with Amazon SageMaker to build, train, and deploy machine learning models.

Maybe it could be a good idea having some models already built and just training for your specific needs, but AWS has bet to trained systems.

AI Predefined Trained Model Services

Amazon offers a set of Pre- trained Model Services that is available through an API o Javascript library ready to use without any training.

The Amazon AI Pre-trained Model Services can be grouped in:

Sight
- Amazon Rekognition: Identify objects, people, text, scenes, and activities, as well as detect any inappropriate content of any image or video.
Language
- Amazon Comprehend: Natural language processing to extract insights and relationships from unstructured text.
- Amazon Textract: Extracts text and data from scanned documents.
- Amazon Translate: translate texts into more than 25 languages.
Conversation
- Amazon Lex: build conversational agents to improve customer service and increase contact center efficiency.
- Amazon Polly: Turn text into lifelike speech to give voice to your applications.
- Amazon Transcribe: convert audio to text. Easily add high-quality speech-to-text capabilities to your applications and workflows.
Data Forecast
- Amazon Personalize: Combine user interaction data with contextual data to generate high-quality recommendations.
- Amazon Forecast: Accurate time-series forecasting service.

AI Solutions

Amazon has defined the Amazon ML Solutions Lab that pairs clients team with Amazon machine learning experts to prepare data, build and train models, and put models into production.

Amazon has also defined 2 learning tools:

AWS DeepRacer
AWS DeepRacer is a fully autonomous 1/18th-scale race car designed to help you learn about reinforcement learning through autonomous driving.
AWS DeepLens
AWS DeepLens is the world’s first deep learning-enabled video camera for developers. Integrated with Amazon SageMaker and many other AWS services, it allows you to get started with deep learning in less than 10 minutes through sample projects with practical, hands-on examples.

Finally, Machine Learning in AWS Marketplace offers four kinds of solutions:

Algorithms & Models – NEW
Data Solutions
Machine Learning Solutions
Intelligent Solutions

Google AI Services

AI Hardware Accelerators

Google offers two kinds of HW accelerators:

GPUs (Graphical Processing Units)
NVIDIA Tesla K80 (up to 8 GPUs), P100 (up to 4 GPUs), P4 (up to 4 GPUs), T4 (up to 4 GPUs), and V100 (up to 8 GPUs) GPUs
TPUs (Tensor Processing Units)
Currently, there are four Cloud TPU configurations:
- v2 single device
  Up to 180 teraflops and 64 GB High Bandwidth Memory (HBM)
- v2 Pod (Beta) (a collection of TPU devices connected together with high-speed interfaces)
  Up to 11.5 petaflops 4 TB HBM
- v3 single device
  Up to 420 teraflops and 128 GB HBM
- v3 Pod (Beta) ) (a collection of TPU devices connected together with high-speed interfaces)
  Up to 100+ petaflops and 32 TB HBM

Google recommend the following guidelines:

GPUs
- Models that are not written in TensorFlow or cannot be written in TensorFlow
- Models for which source does not exist or is too onerous to change
- Models with a significant number of custom TensorFlow operations that must run at least partially on CPUs
- Models with TensorFlow ops that are not available on Cloud TPU (see the list of available TensorFlow ops)
- Medium-to-large models with larger effective batch sizes
TPUs
- Models dominated by matrix computations
- Models with no custom TensorFlow operations inside the main training loop
- Models that train for weeks or months
- Larger and very large models with very large effective batch sizes

AI Frameworks

Google manage or integrate the following frameworks out of the box:

TensorFlow
PyTorch
Scikit-learn
Keras
XGBoost (an optimized distributed gradient boosting library )
Kubeflow (to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable)
JupyterLab

AI Machine Learning Foundation Platform

Google offers the “AI Platform” to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data.

The overall picture of Google Machine Learning end to end development cycle is the following

Prepare Data (cover by the Data Products of Google)
- Ingestions; Cloud Storage & Transfer Service
- Data preparation & preprocess; Cloud Dataprep, Cloud Dataflow, Cloud Dataproc, BigQuery
Build and define a Model (cover by AI platform)
- Data Labeling Service
- Deep Learning VM Image
- AI Platform Notebooks
Training, Test & Analyze (cover by AI platform)
- AI Platform Training
- Kubeflow
- TFX Tools
Deployment & Predict (cover by AI platform)
- AI Platform Prediction
- Kubeflow

In addition Google offers Cloud Datalab a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform.

Let’s see the main Google AI platform products & services:

Data Labeling Service (beta)

The Data Labeling Service enables you to submit the representative samples to human labelers who annotate them with the “right answers” and return the dataset in a format suitable for training a machine learning model.

Deep Learning VM Image

AI Platform Deep Learning VM Image makes it easy and fast to instantiate a VM image containing the most popular deep learning and machine learning frameworks on a Google Compute Engine instance. You can launch Compute Engine instances pre-installed with popular ML frameworks like TensorFlow, PyTorch, or scikit-learn

AI Platform Notebooks

AI Platform Notebooks is a managed service that offers an integrated JupyterLab environment in which machine learning developers and data scientists can create instances running JupyterLab that come pre-installed with the latest data science and machine learning frameworks in a single click. Notebooks is integrated with BigQuery, Cloud Dataproc, and Cloud Dataflow, making it easy to go from data ingestion to preprocessing and exploration, and eventually model training and deployment.

AI Platform Training

AI Platform runs your training job on computing resources in the cloud. You can train a built-in algorithm (beta) against your dataset without writing a training application. If built-in algorithms do not fit your use case, you can create a training application to run on AI Platform.

AI Platform Prediction

The AI Platform prediction service manages computing resources in the cloud to run your models. You can request predictions from your models and get predicted target values for them.

Kubeflow & Kubeflow pipelines

The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow Pipelines are a new component of Kubeflow that can help you compose, deploy, and manage end-to-end (optionally hybrid) machine learning workflows.

TFX Tools

TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines

AI Predefined Untrained Model Services

Google offers Cloud AutoML as a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs.

Google AI Predefined Untrained Model Services can be grouped in:

Sight
- AutoML Vision: Derive insights from images in the cloud or at the edge.
- AutoML Video Intelligence: Enable powerful content discovery and engaging video experiences.
Language
- AutoML Natural Language: Reveal the structure and meaning of text through machine learning.
- AutoML Translation: Dynamically detect and translate between languages.
Data Forecast (provide predictive insight of structure data)
- AutoML Tables: Automatically build and deploy state-of-the-art machine learning models on structured data.

AI Predefined Trained Model Services

Google offer a set of Predefined Trained Model Services that is available through a REST and RPC APIs ready to use without any training.

The Google AI Predefined Trained Model Services can be grouped in:

Sight
- Vision API: classify imagens and detect faces, printed and handwritten text, places and more.
- Video Intelligence API: recognize a vast number of objects, places, and actions in stored and streaming video.
Language
- Natural Language API: Natural language understanding features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis.
- Translation API: translate texts into more than one hundred languages.
Conversation
- Dialogflow: an end-to-end development suite for creating conversational interfaces for websites, mobile applications, popular messaging platforms, and IoT devices.
- Cloud Text-to-Speech: converts text into human-like speech in more than 100 voices across 20+ languages and variants.
- Cloud Speech-to-Text: convert audio to text. The API recognizes 120 languages and variants to support your global user base.
Data Forecast
- Recommendations AI: Deliver highly personalized product recommendations at scale.
- Cloud Inference API: Quickly run large-scale correlations over typed time-series datasets.

AI Solutions

Google offers the following Prepackaged solutions:

Contact Center AI
This AI model works with your existing telephony and customer support technology and can easily be trained to engage with customers via speech or text, freeing your reps to handle more complex issues and provide a better customer experience.
Cloud Talent Solution
Cloud Talent solution will power company’s job and profile search, matching job seekers to open positions more accurately, and includes APIs that help talent technology providers and enterprise hiring companies attract great candidates and make the right hires.
Document Understanding AI
To help your organization efficiently analyze documents. By automatically classifying, extracting, and enriching this information, Document Understanding AI can save time and resources while unlocking insights that improve your decision-making.

Google also offer a Google Cloud’s AI Hub that is a hosted repository of plug-and-play AI components, including end-to-end AI pipelines and out-of-the-box algorithms

Azure AI Services

AI Hardware Accelerators

Azure offers two kinds of HW accelerators:

GPUs (Graphical Processing Units)
NVIDIA Tesla V100 (up to 8 GPUs) , K80 (up to 4 GPUs), P100 (up to 4 GPUs), P40 (up to 4 GPUs) and M60 (up to 4 GPUs) GPUs
FPGA (Field-Programmable Gate Array)
FPGAs on Azure are based on Intel’s FPGA devices, which data scientists and developers use to accelerate real-time AI calculations. It is offer as a service for the following models:
- ResNet 50
- ResNet 152
- DenseNet-121
- VGG-16
- SSD-VGG

AI Frameworks

Azure manage or integrate the following frameworks out of the box:

TensorFlow
PyTorch
Scikit-learn
Apache MXNet
ONNX

AI Machine Learning Foundation Platform

Azure offers Azure Machine Learning service that provides SDKs and services to quickly prep data, train, and deploy machine learning models based on open-source Python frameworks, such as PyTorch, TensorFlow, scikit-learn and MXNet.

Azure also offers Machine Learning Studio that is a collaborative, drag-and-drop visual workspace where you can build, test, and deploy machine learning solutions without needing to write code. It uses prebuilt and preconfigured machine learning algorithms and data-handling modules as well as a proprietary compute platform.

The comparison based on Azure documentation is the following:

	Machine Learning Studio	Azure Machine Learning service: Visual interface
Availability	Generally available (GA)	In preview
Modules for interface	Many	Initial set of popular modules
Training compute targets	Proprietary compute target, CPU support only Supports Azure	Machine Learning compute, GPU or CPU. (Other computes supported in SDK)
Deployment compute targets	Proprietary web service format, not customizable	Enterprise security options & Azure Kubernetes Service. (Other computes supported in SDK)
Automated model training and hyperparameter tuning	No	Not yet in visual interface. (Supported in the SDK and Azure portal.)

Clearly Azure has decided to move from a proprietary environment (Machine Learning Studio) to a more open and standard approach based on Phyton and Tensorflow (Azure Machine Learning service)

The overall picture of Azure Machine Learning service is the following

Prepare Data
- Ingestion; Azure Machine Learning datasets (preview)
- Data preparation & preprocess; azureml-datasets package
Build, Training, Test & Deploy Models
- Azure Machine Learning SDK for Python
- Visual interface (preview) for Azure Machine Learning service
- Azure Machine Learning CLI
- Azure Machine Learning Notebooks

Let’s see the main Azure Machine Learning Services:

Azure Machine Learning datasets (preview)

With managed datasets, you can:

Easily access data during model training without reconnecting to underlying stores
Ensure data consistency & reproducibility using the same pointer across experiments: notebooks, automated ml, pipelines, visual interface
Share data & collaborate with other users
Explore data & manage lifecycle of data snapshots & versions
Compare data in training to production

azureml-datasets package(under AzurevML SDK)

The Dataset class is a foundational resource for exploring, transforming and managing data within Azure Machine Learning.
You can explore your data with summary statistics, transform it using intelligent transforms. When you’re ready to use the data for training, you can save the Dataset to your AML workspace to get versioning and reproducibility capabilities.

Azure Machine Learning SDK for Python

Azure Machine Learning SDK for Python allows build and run machine learning workflows with the Azure Machine Learning service.

The most important classes and packages in the SDK are:

Workspace, the top-level resource for Azure Machine Learning service. It provides a centralized place to work with all the artifacts you create when you use Azure Machine Learning service.
Experiment that represents a collection of trials (individual model runs)
Run that represents a single trial of an experiment.
Model is used for working with cloud representations of machine learning models. You can use model registration to store and version your models in the Azure cloud, in your workspace.
ComputeTarget, RunConfiguration, and ScriptRunConfig for creating and managing compute targets, setting the type and size, and attach the compute target configuration, and to specify the path/file to the training script.
AutoMLConfig to configure parameters for automated machine learning training.
Image for packaging models into container images that include the runtime environment and dependencies.
Webservice for creating and deploying web services for your models.
ML pipelines to define that Defines reusable machine learning workflows that can be used as a template for your machine learning scenarios.

Visual interface (preview) for Azure Machine Learning service

The visual interface (preview) for Azure Machine Learning service enables you to prep data, train, test, deploy, manage, and track machine learning models without writing code.

Azure Machine Learning CLI

The Azure Machine Learning CLI is an extension to the Azure CLI, a cross-platform command-line interface for the Azure platform. This extension provides commands for working with the Azure Machine Learning service. It allows you to automate your machine learning activities. The following list provides some example actions that you can do with the CLI extension:

Run experiments to create machine learning models
Register machine learning models for customer usage
Package, deploy, and track the lifecycle of your machine learning models

Azure Machine Learning Notebooks repository

The Azure Machine Learning Notebooks repository includes the latest Azure Machine Learning Python SDK samples. These Juypter notebooks are designed to help you explore the SDK and serve as models for your own machine learning projects.

AI Predefined Untrained Model Services

Azure offer a set of AI Predefined Untrained Model Services that requires some custom training to use it. Both trained and untrained services are grouped under “Azure Cognitive Services”

Azure Predefined Untrained Model Services can be grouped in:

Sight
- Custom Vision: To build, deploy and improve your own image classifiers.
Conversation
- Speech Services (custom options): that is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription.
Language
- QnA Maker: that enables you to create a knowledge-base(KB) from your semi-structured content such as Frequently Asked Question (FAQ) URLs, product manuals, support documents and custom questions and answers.
- Custom Translator: an extension of the Translator Text API which allows you to build neural translation systems.

AI Predefined Trained Model Services

Azure offers a set of Pre- trained Model Services under the “Azure Cognitive Services” that is available through a REST API or SDK ready to use without any training.

In addition Azure has annunciated that you can deploy Azure Cognitive services to the edge, on premises and in the cloud using containers.

The Azure AI Pre-trained Model Services can be grouped in:

Sight
- Computer Vision API: that analyzes images to detect and provide insights about their visual features and characteristics.
- Face API: that detects human faces in an image and returns the rectangle coordinates of their locations. Optionally, face detection can extract a series of face-related attributes. Examples are head pose, gender, age, emotion, facial hair, and glasses.
- Video Indexer: that enables you to extract the insights from your videos.
- Form Recogniser (preview): to identify and extract key/value pairs and table data from form documents.
- Ink Recogniser (preview): to analyze and recognize digital ink content. Unlike services that use Optical Character Recognition (OCR), the API requires digital ink stroke data as input. Digital ink strokes are time-ordered sets of 2D points (X,Y coordinates) that represent the motion of input tools such as digital pens or fingers.
- Content Moderator API: that checks text, image, and video content for material that is potentially offensive, risky, or otherwise undesirable.
Language
- Language Understanding (LUIS): to a user’s conversational, natural language text to predict overall meaning, and pull out relevant, detailed information.
- Text Analytics API: that provides advanced natural language processing over raw text, and includes four main functions: sentiment analysis, key phrase extraction, language detection, and entity recognition.
- Translator Text: to translate text in near real-time.
- Immersive Reader (Preview): to improve reading comprehension for emerging readers, language learners, and people with learning differences such as dyslexia.
Conversation
- Speech Services: that is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription.
- Speaker Recognition: that provides the most advanced algorithms for speaker verification and speaker identification.
Data Forecast
- Personalizer: to discover what action to rank highest in a context.
- Anomaly Detector API: That enables you to monitor and detect abnormalities in your time series data.

Azure also adds Bing Search API as a part of Azure Cognitive Services, however both Amazon and Google does not consider his Search Services as a part of AI services, so I will not include in the AI services comparison. In fact in the Azure Bing Search API documentation there isn’t any mention to Machine Learning implementations.

Azure also provide access to the Cognitive Services Labs with an early look at emerging Cognitive Services technologies. Early adopters who do not need market-ready technology can discover, try and provide feedback on new Cognitive Services technologies before they are generally available. Labs are not Azure services

Finally Azure is in a process to migrate Statistical Parametric implementations to Neural Networks implementation to improve his cognitive services.

AI Solutions

Azure offers the following Solutions based on the Cognitive Services:

Azure Bot Service
Develop intelligent, enterprise-grade bots that let you maintain control of your data. Build any type of bot—from a Q&A bot to your own branded virtual assistant. Use a open-source SDK and tools to easily connect your bot across popular channels and devices. Give your bot the ability to speak, listen, and understand your users with native integration of Azure Cognitive Services.
Voice-first virtual assistants (preview)
Custom virtual assistants using Azure Speech Services to create natural, human-like conversational interfaces for their applications and experiences.
Azure Databricks
Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform and integrated with Azure Machine Learning service to quickly identify suitable algorithms and hyperparameters

Azure also offers Azure AI Gallery portal to quickly build Azure AI Solutions from solution templates, reference architectures and design patterns. Make them your own with the included instructions or with a featured partner.

Alibaba AI Services

AI Hardware Accelerators

Alibaba offers two kind of HW accelerators:

GPUs (Graphical Processing Units)
AMD FirePro S7150 (up to 4 GPUs), NVIDIA Tesla M40 (up to 2 GPUs), NVIDIA Tesla P100 (up to 8 GPUs), NVIDIA Tesla P4 (up to 2 GPUs), and NVIDIA Tesla V100 (up to 8 GPUs).
FPGA (Field-Programmable Gate Array)
Intel ARRIA 10 GX 1150 FPGA up to 2 FGPAs (with 56 vCPUs) and 224 GIB memory
Xilinx 16nm Virtex UltraScale + VU9P up to 4 FGPAs (with 64 vCPUs) and 256 GIB memory

AI Frameworks

Alibaba support the following frameworks (beta testing):

TensorFlow
Apache MXNet
Caffe

AI Machine Learning Foundation Platform

Alibaba Cloud Machine Learning Platform for AI (PAI) provides an all-in-one machine learning service featuring low user technical skills requirements, but with high performance results. On the Machine Learning Platform for AI, you can quickly establish and deploy machine learning experiments to achieve seamless integration between algorithms and your business. Machine Learning Platform for AI is built on the full-fledged algorithm application system of Alibaba Group, and is now serving tens of thousands of developers and enterprise users. You can quickly build services such as product recommendation, financial risk control, image identification, and voice recognition based on Machine Learning Platform for AI to implement artificial intelligence.

The Alibaba Cloud ML Platform is a proprietary platform that has the following features:

User interface: You can quickly and easily build machine learning experiments using the drag-and-pull method and the computing results of the entire machine learning process can be visually displayed.
Rich algorithm components: Provides more than 100 algorithm modules for regression, classification, clustering, text analysis, relationship mining, and many other models. Supports preprocessing tools and software, feature engineering, analysis systems, application areas, common machine learning algorithms, financial algorithms.
All-in-one service: Provides a comprehensive service experience by helping users implement data cleansing, feature engineering, machine learning algorithms, evaluation, online prediction, and offline scheduling on the same platform.

However, currently, the deep learning feature is in beta testing. Three deep learning frameworks: TensorFlow, Caffe, and MXNet are supported.

For ingestion and data processing Alibaba recommends DataWorks (beta) that provides a full solution for data aggregation, data processing, data governance, and date services. The involved features include data integration, data development, data quality, data protection, and data services.

AI Predefined Untrained Model Services

Alibaba as AWS has decided to offer only pre-trained AI Services. If you want an untrained model you need to work with Alibaba Cloud Machine Learning Platform for AI to Build, train, and deploy machine learning models.

AI Predefined Trained Model Services

Alibaba offers a very limit set of Pre- trained Model Services that is available through an API and SDK libraries ready to use without any training.

The Alibaba AI Pre-trained Model Services can be grouped in:

Sight
- Alibaba Image Search: to obtain information about products that are similar or identical to the product in your input image and find images containing subjects or elements that are similar or identical to your input image.
Language
- Intelligent Robot (Beta): A dialogue platform that enables smart dialog (based on natural language processing) through a range of dialogue-enabling clients
- Alibaba Machine Translation: to provide e-commerce machine translation engine and a multi-language technological infrastructure for enterprises multi-language services.

AI Solutions

Alibaba offers the following AI solutions as a starting point:

Product Recommendation
Discover the features that influence shopping behaviour and provide customer recommendations that increase product sales.
Financial Risk Management
Calculate the capability of clients to settle their credit card debt. Risk indexes help financial institutions manage risks effectively.
News Classification
Text analysis components automatically classify documents in a short period of time.

Recently Alibaba has launched ET Brain an ultra-intelligent AI Platform for solving complex business and social problems:

ET City Brain
Utilizing comprehensive real-time city data, ET City Brain holistically optimizes urban public resources by instantly correcting defects in urban operations.
ET Industrial Brain
Empowering smart manufacturing with data and machine intelligence
ET Medical Brain
Alibaba Cloud is committed to apply data intelligence to help doctors and nurses offer better healthcare services to patients and ultimately save more lives.
ET Environment Brain
Data-driven green development for a smart and ecological civilization

AI HW Accelerators Comparison

AI Framework Comparison

Built-in Algorithms Comparison

AI Machine Learning Foundation Platform Comparison

AI Predefined Untrained Model Services Comparison

AI Predefined Trained Model Services Comparison

AI Solutions Comparison

Conclusion

AI services are the most promising cloud services and all cloud providers are focused on growing their offer. Broadly speaking, the four main public cloud providers follow a similar approach, dividing their services into four layers:

AI Solutions
AI Predefined Model Services (trained or untrained)
AI Machine Learning Foundation Platform based on well know AI Frameworks
AI Hardware Accelerators

Let’s see the strategies and approaches of each provider in each layer from bottom to top:

AI Hardware Accelerators

All providers (Amazon, Google, Azure and Alibaba) offer GPUs to accelerate the learning process of AI models. Working with a GPU means moving all the data to the GPU and then processing it, they’re good for doing high-latency computation in batches, but they consume a lot of power and its design was not originally developed for machine learning processes (although it is closer than traditional CPUs). In addition, GPUs was designed for graphics and high-performance computing systems where safety is not a requirement.

All providers have realized that they need to go one step further in the field of HW accelerators to be able to offer more power and security with less consumption in the increasingly complex AI models.

In this case there are two strategies:

Google has opted to build a specialized processor (ASIC) for Tensorflow; TPU (Tensor Processing Unit). Google launched the first version of TPU in 2016 and they are already in the third generation. It is a very proven technology and has recently released Cloud TPU Pod that can include more than 1,000 individual TPU chips which are connected by an ultra-fast, two-dimensional toroidal mesh network.
Amazon, Azure and Alibaba have opted for FPGA processors (Field-Programmable Gate Array) that are more flexible than TPUs but need a specific programming and are less efficient under Tensorflow tasks (compared to TPU)

Google has made a very strong bet with its custom TPUs chips assuming that the Tensorflow framework is a standard accepted by all developers and suppliers, as indeed it seems. Amazon, Azure and Alibaba prefer a more conservative and flexible option that does not tie them to a specific framework using FPGA solutions, but in my opinion they are less powerful for models under Tensorflow and they are 1-2 years behind the google developments in TPU .

However, like any single-source solution, TPUs can be overcome by the next generations of FPGAs developed by Intel (Altera) and Xilinx. In addition, Amazon and Azure are working to render FPGA’s more accessible and easier to program and use.

AI Machine Learning Foundation Platform based on well know AI Frameworks

In this case, the strategy of the four providers is clear:

Leverage on standard AI frameworks (most Open source) where Tensorflow is currently the winner framework.
Define a platform that simplifies and accelerates the deployment of Machine Learning solutions covering the tasks of:
1. Data Preparation
2. Build and Training a ML Model
3. Deployment and Predict

Leverage on standard AI frameworks

Tensorflow (Open source library for Machine Learning and high-performance numerical computation) and PyTorch (Open Source Machine Learning library) are the frameworks supported by the four Cloud providers. It also seems clear that Python is the reference language in all AI frameworks.

MXNet is the alternative to Tensorflow supported by Amazon, Azure and Alibaba (together with Tensorflow), but even if MXNet with its high-performance imperative API could overcome Tensorflow, Google with its unique bet on Tensorflow makes the balance tilt.

Scikit-learn (provides a wide selection of supervised and unsupervised learning algorithms) and Keras (high-level API to build and train deep learning models) are the following most used frameworks.

Amazon is the provider that supports a greater variety of frameworks. Google and Azure focus on the frameworks required for each tasks and finally Alibaba started with a proprietary approach and is evolving (currently in beta mode) to a framework-based approach (Tensorflow, MXNet and Caffe)

Define a platform that simplifies and accelerates the deployment of Machine Learning solutions

Here the approach is different in each provider and is one of the areas clearly with potential for improvement.

Amazon offers Amazon SageMaker a set of tools to build, train, and deploys machine learning models fast that supports TensorFlow and Apache MXNet out-of-the-box
Google launched the AI Platform to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data based on Tensorflow, PyTorch and Scikit-learn
Azure offers Azure Machine Learning service that provides SDKs and services to quickly prep data, train, and deploy machine learning models based on open-source Python frameworks, such as PyTorch, Scikit-learn and MXNet
Alibaba offers Alibaba Cloud Machine Learning Platform for AI (PAI) that provides an all-in-one machine learning service featuring low user technical skills requirements. It is a proprietary environment that is evolving to TensorFlow, Caffe, and MXNet

All providers claim to offer a fully managed service that covers the entire machine learning workflow to build, train, and deploy machine learning models quickly but with different approaches:

Azure and Alibaba offers a visual interface for modeling, and Amazon and Google are more code oriented (more flexible but more complex).
All providers use his storage service as an input for training & test , but Google offers more option for data preparation (Cloud Dataprep, Cloud Dataflow, Cloud Dataproc, BigQuery) and also offer Cloud Datalab a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models
Amazon Sagemaker and Alibaba ML platform don’t allow train/deploy the model out of the cloud, while Azure and Google allow train/deploy the model in the cloud or on premises (some features).
All providers start to train and deploy ML Models in containers. Google started an open source project called Kubeflow that combines the best of TensorFlow and Kubernetes to train and deploy ML models in containers.
All providers offer built-in Ml Algorithms that help you train models for a variety of use cases:
- Two-class classification
- Multi-class classification
- Regression
- Ranking
- Anomaly detection (Google offer a service instead)
- Clustering (under ML BigQuery in Google)

Google offers a limited built-in Algorithms (only 3) compared to their peers, partly because they are already covering it in their services or products.

JupyterLab (an interactive development environment for working with notebooks, code and data) become to be the ML and data science standard UI

In general, these AI Platforms are very recent and are evolving rapidly to:

Simplify to the maximum the process to train and deploy of ML models (and the lifecycle) incorporating DevOps capabilities
Expand the available ML algorithms
Define a more intelligible price structure
Align with market standards
Increase deployment/training flexibility
Allow deployment/training at the edge

AI Predefined Model Services (trained or untrained)

All suppliers have decided to focus on increasing the AI Predefined Model Services that can be grouped in:

Sight
- Vision (Image classification, OCR, Faces detection, explicit content, ..)
- Video (Video classification, OCR, track objects, explicit content, .. )
Language
- Natural Language (Reveal the structure and meaning of text, Sentiment analysis, Syntax analysis, …)
- Translation (translate between languages, language detection,…)
Conversation
- Speech-to-Text
- Text-to-Speech
- Natural Conversation
Data Forecast
- Time Series dataset correlations & abnormally detection
- Recommendations/personalization
- Data Warehouse ML

With the following aspects to point out:

Google offers, in addition of a full set of AI Predefined Trained Model Services, a set of untrained model services; Cloud AutoML (Sight, Language and Data forecast) to support more customization specific to the client business needs. Azure also offers this customization for some sight and Language services.
Google allows deploy vision machine learning models at the edge. Azure allows deploy vision (Face recognition & OCR) and language (Key Phrase Extraction, Language Detection, Sentiment Analysis and Language Understanding ) machine learning models at the edge.
AWS offers Greengrass to allow local compute, messaging, data caching, sync, and ML inference capabilities to edge devices (pseudo edge deployment that at the end requires a connection with AWS)
The AI Predefined Model Services are very similar among Amazon, Azure and Google. The differences lies in the features of each service (see the comparisons tables). For instance Google recognizes 120 languages and variants, azure 60 and Amazon 25. However, if you count the features, Azure has the better score (mainly in Video) followed by Google and Amazon.
Alibaba is clearly at the bottom of his peers.

AI Solutions

Related to AI Solutions the cloud providers are following two approaches:

Create Prepackage AI Solutions to solve specific business needs.
Create an AI Portal and Hub to share experiences and other AI components.

At the moment, the Prepackage AI solutions are very limited and only Google and Azure are offering valuable (but limited) Solutions.

The business areas that Prepackage AI Solutions should growth are:

Intelligent Contact Center
CRM
Human Resource and Intelligent Recruitment
Document digitalization
Data Analytics
Cybersecurity
Sales/revenue forecasting
Digital Marketing
Personalization and recommendation
Gaming
Robotics & smart cars (with 5G)
Financial Trading
Healthcare
Logistics and Delivery

Summary

In short, Google seems to be ahead in AI services partly because of its integration with data services and its experience in Tensorflow, although it has an improvement path to simplify its processes adding more visual and DevOps capabilities.

Amazon and Azure are evolving very quickly, and will probably reach the level of Google in the coming years. In fact, Azure has the richest set of AI predefined Model services and a promising visual AI Platform.

IBM with IBM Watson Machine Learning, IBM Watson Studio and IBM Watson services has a robust offer of AI close to Amazon and Azure with a power visual interface. The key question is Will IBM be able to follow the rapid evolution of Amazon, Google and Azure AI Services?

Alibaba is at the bottom and has a long way to reach his peers except for some specific ecommerce needs.

The great differentiation will come when the supply of the upper layers increases: AI Solutions and Predefined AI Models, so the gap between sparse Data/AI scientists and functional teams is reduced.