Home Latest ArticlesPublic Cloud AI ReviewThe objective of this post is to analyze the Artificial Intelligent (AI) Services offered by the four main providers of public cloud; AWS, GCP, Azure and Alibaba. AI is one of the most promising services in the cloud that is growing with difference compared to other services. AI benefits especially from the cloud due to the calculation and storage power in the training processes and the need for power peaks during its production startup. To be fair, I should include IBM in the comparison, but since the blog in the previous comparisons I have only included Amazon, Google, Azure and Alibaba (really public clouds) I will not include it, although in the summary I will add some considerations. Table of Contents AI ServicesAI Hardware AcceleratorsAI FrameworksAI Machine Learning Foundation PlatformAI Predefined Untrained Model ServicesAI Predefined Trained Model ServicesAI SolutionsAWS AI ServicesAI Hardware AcceleratorsAI FrameworksAI Machine Learning Foundation PlatformData Labeling Service: Amazon SageMaker Ground TruthAWS Deep Learning AMIsAWS Deep Learning ContainersAmazon SageMaker notebookAmazon SageMaker Built-in AlgorithmsAmazon SageMaker RLAmazon Training JobAmazon SageMaker automatic model tuningAmazon SageMaker NeoHTTPS endpointBatch TransformInference PipelineAI Predefined Untrained Model ServicesAI Predefined Trained Model ServicesAI SolutionsGoogle AI ServicesAI Hardware AcceleratorsAI FrameworksAI Machine Learning Foundation PlatformData Labeling Service (beta)Deep Learning VM ImageAI Platform NotebooksAI Platform TrainingAI Platform PredictionKubeflow & Kubeflow pipelinesTFX ToolsAI Predefined Untrained Model ServicesAI Predefined Trained Model ServicesAI SolutionsAzure AI ServicesAI Hardware AcceleratorsAI FrameworksAI Machine Learning Foundation PlatformAzure Machine Learning datasets (preview)azureml-datasets package(under AzurevML SDK)Azure Machine Learning SDK for PythonVisual interface (preview) for Azure Machine Learning serviceAzure Machine Learning CLIAzure Machine Learning Notebooks repositoryAI Predefined Untrained Model ServicesAI Predefined Trained Model ServicesAI SolutionsAlibaba AI ServicesAI Hardware AcceleratorsAI FrameworksAI Machine Learning Foundation PlatformAI Predefined Untrained Model ServicesAI Predefined Trained Model ServicesAI SolutionsAI HW Accelerators ComparisonAI Framework ComparisonBuilt-in Algorithms ComparisonAI Machine Learning Foundation Platform ComparisonAI Predefined Untrained Model Services ComparisonAI Predefined Trained Model Services ComparisonAI Solutions ComparisonConclusionAI Hardware AcceleratorsAI Machine Learning Foundation Platform based on well know AI FrameworksLeverage on standard AI frameworksDefine a platform that simplifies and accelerates the deployment of Machine Learning solutionsAI Predefined Model Services (trained or untrained)AI SolutionsSummary AI Services Unlike other services, I have been able to find a blueprint in a cloud provider that reflects the structure of the AI services. The blueprint I’m going to use to compare AI services among cloud providers are based on Google’s vision and have the following layers: AI Solutions AI Predefined Trained Model Services AI Predefined Untrained Model Services AI Machine Learning Foundation Platform AI Frameworks AI Hardware Accelerators Let’s see bottom up the layers; AI Hardware Accelerators AI HW accelerators are Application-specific integrated circuit (ASIC) to speed up mainly the training of AI models. Up to date there are three relevant AI HW Accelerators: GPUs (Graphical Processing Units) Created by NVIDIA it was originally designed to improve the power of graphics applications that need high computing power in parallel. However, the algorithm to training Supervised Machine Learning models is usually based on gradient descent that requires multiplying very large matrices of numbers. This can be done in parallel since the order you do that does not matter. Therefore the GPUs can also be applied to training ML models. TPUs (Tensor Processing Units) Created by Google to speed up Tensorflow. TPU enables you to run your machine learning workloads on Google’s TPU accelerator hardware using TensorFlow. TPU’s allows to training up to 17 times faster than GPUs with a saving of up to 38% of cost. Unfortunately Google’s TPUs are proprietary and are not commercially available. FPGA (Field-Programmable Gate Array) FPGA is an integrated circuit designed to be configured by a customer or a designer after manufacturing. You can reconfigure FPGAs for different types of machine learning models. This flexibility makes it easier to accelerate the applications based on the most optimal numerical precision and memory model being used. AI Frameworks All the Cloud providers use AI Frameworks to offer his services. The framework will simplify the effort to create a Machine Learning model like: Implement multiple Algorithms and Model structures Computational graph abstraction Helps the Train and Evaluate the Model Integrate with multiple data sources Deploy in multiple machines and platforms The main Frameworks used by the cloud Providers are: End to end ML frameworks Tensorflow Open source library for Machine Learning and high-performance numerical computation originally developed by Google. Language: C++ or Python PyTorch Open Source Machine Learning library inspired by Torch. It has primarily been developed by Facebook‘s artificial intelligence research group. Language: C++ or Python Apache MXNet Open Source Machine Learning software framework, used to train, and deploy deep neural networks. Language: C++, Python, Julia, Matlab, JavaScript, Go, R, Scala, Perl, and Wolfram Caffe Caffe is a deep learning framework made with expression, speed, and modularity in mind. Caffe supports many different types of deep learning architectures geared towards image classification and image segmentation. It supports CNN, RCNN, LSTM and fully connected neural network designs. Caffe supports GPU- and CPU-based acceleration computational kernel libraries such as NVIDIA cuDNN and Intel MKL Chainer Open Source flexible Python-based framework for neural networks. It was developed by Preferred Networks, a startup based in Japan. This framework allows writing complex architectures simply and intuitively. Theano Open Source software developed by the Montreal Institute for Learning Algorithms at the University of Montreal. It is a python library and optimizing compiler for manipulating and evaluating Tensor operations. Microsoft Cognitive Toolkit (CNTK) Open Source toolkit for commercial-grade distributed deep learning. It describes neural networks as a series of computational steps via a directed graph. Language: C++, C# or Python Specific ML services Framework Scikit-learn Scikit-learn provides a wide selection of supervised and unsupervised learning algorithms. ONNX Open Neural Network Exchange Format (ONNYX) is a new standard for exchanging deep learning models. It promises to make deep learning models portable thus preventing vendor lock in. Currently there is native support in ONNX for PyTorch, CNTK, MXNet, and Caffe2 but there are also converters for TensorFlow and CoreML. Keras A high-level API to build and train deep learning models. Written in Python and capable of running on top of TensorFlow, CNTK, or Theano. Horovod Distributed training framework for TensorFlow, Keras, PyTorch, and Apache MXNet. Gluon Gluon is an open source deep learning library jointly created by AWS and Microsoft (to position against Google AI war) that helps developers build, train and deploy machine learning models in the cloud. Gluon provides a clear, concise API for defining machine learning models using a collection of pre-built, optimized neural network components. At the moment, Gluon provides support for Apache MXNet and CNTK JupyterLab A web-based interactive development environment for working with notebooks, code and data. JupyterLab has full support for Jupyter notebooks and enables you to use text editors, terminals, data file viewers, and other custom components side by side with notebooks in a tabbed work area. JupyterLab is flexible: configure and arrange the user interface to support a wide range of workflows in data science, scientific computing, and machine learning Horovod Open source distributed deep learning framework for TensorFlow by Uber. It is developed and supposed to be easy to develop distributed deep learning projects and speed them up with TensorFlow. As we will see, Tensorflow is the most extended framework in cloud providers. AI Machine Learning Foundation Platform All cloud providers offer a Machine Learning platform based on a set of the frameworks described above. The basic aim of the AI ML Foundation Platform is to simplify and accelerate the deployment of a ML Solution based on a set of framework covering the following tasks: Prepare Data Build and define a Model Training Deployment Predict In addition, all Cloud providers offer built-in ML Algorithms that help you train models for a variety of use cases: Two-class classification Multi-class classification Regression Clustering Ranking Anomaly detection AI Predefined Untrained Model Services The future of AI services in Cloud is the offer of trained or untrained models ready to perform a specific task. The AI Predefined Untrained Model Services are AI Models (algorithms and structure) ready to perform a specific task but need to be trained (with your training data) to tailor your business needs. The objective is enabler developers with limited machine learning expertise to train high-quality models specific to their business needs. The AI Predefined Untrained Model Services can be grouped in: Sight Vision (Image classification) Video (Video classification) Language Natural Language (Reveal the structure and meaning of text) Translation (translate between languages) Data Forecast (provide predictive insight of structure data) AI Predefined Trained Model Services The Predefined Trained Model Services are AI trained Models ready to perform a specific task. This service is offered through a Rest API or SDK ready to use without any training. The AI Predefined Trained Model Services can be grouped in: Sight Vision (Image classification, OCR, Faces detection, explicit content, ..) Video (Video classification, OCR, track objects, explicit content, .. ) Language Natural Language (Reveal the structure and meaning of text, Sentiment analysis, Syntax analysis, …) Translation (translate between languages, language detection,…) Conversation Speech-to-Text Text-to-Speech Natural Conversation Speaker recognition Data Forecast Time Series dataset correlations & abnormally detection Recommendations/personalization Data Warehouse ML But these are the first AI services offered by cloud providers that in the near future will increase the offer. AI Solutions AI Solutions are prepackaged business solutions based on AI. Again this is a high growth area and relies on the services of the lower AI layers. The target business processes for AI Solutions are: Intelligent Contact Center CRM Human Resource and Intelligent Recruitment Document digitalization Data Analytics Cybersecurity Sales/revenue forecasting Digital Marketing Personalization and recommendation Gaming Robotics & smart cars (with 5G) Financial Trading Healthcare Logistics and Delivery And more….. The cloud providers has started to offers AI Solutions to increase the penetration at business level. AWS AI Services AI Hardware Accelerators Amazon offers two kinds of HW accelerators: GPUs (Graphical Processing Units) NVIDIA Tesla V100 (up to 8 GPUs) , K80 (up to 16 GPUs) and M60 (up to 4 GPUs) GPUs AWS also offer this service under AWS Inferentia and Amazon Elastic Inference FPGA (Field-Programmable Gate Array) Amazon EC2 F1 instances (Virtex UltraScale+) use FPGAs to enable delivery of custom hardware accelerations with a development Kit. Up to 8 FGPAs (with 64 vCPUs) and 976 GIB memory. AI Frameworks Amazon manage or integrate the following frameworks out of the box (AWS Deep Learning AMIs): TensorFlow PyTorch Apache MXNet Chainer Microsoft Cognitive Toolkit Gluon Keras Horovod Theano JupyterLab AI Machine Learning Foundation Platform Amazon offers Amazon SageMaker to Build, train, and deploy machine learning models fast that supports TensorFlow and Apache MXNet out-of-the-box. The overall picture of Amazon SageMaker is the following Prepare Data (cover by the Data Products of Amazon) Ingestions; Amazon S3 & Transfer Service Data preparation & preprocess; XGBoost Algorithm & AWS Marketplace solutions Build and define a Model (cover by SageMaker) Automated data labeling with Amazon SageMaker Ground Truth AWS Deep Learning AMIs AWS Deep Learning Containers Amazon SageMaker Jupyter notebook Amazon SageMaker Built-in Algorithms Amazon SageMaker RL Training, Test & Analyze (cover by SageMaker) Amazon Training Job Amazon SageMaker automatic model tuning Amazon SageMaker Neo Deployment & Predict (cover by SageMaker) HTTPS endpoint Batch Transform Inference Pipeline Let’s see the main Amazon Sagemaker products & services: Data Labeling Service: Amazon SageMaker Ground Truth Amazon SageMaker Ground Truth helps you build highly accurate training datasets for machine learning quickly. SageMaker Ground Truth offers easy access to public and private human labelers and provides them with built-in workflows and interfaces for common labeling tasks. Additionally, SageMaker Ground Truth can lower your labeling costs by up to 70% using automatic labeling, which works by training Ground Truth from data labeled by humans so that the service learns to label data independently. AWS Deep Learning AMIs AWS Deep Learning AMIs provide machine learning practitioners and researchers with the infrastructure and tools to accelerate deep learning in the cloud, at any scale. You can quickly launch Amazon EC2 instances pre-installed with popular deep learning frameworks and interfaces such as TensorFlow, PyTorch, Apache MXNet, Chainer, Gluon, Horovod, and Keras to train sophisticated, custom AI models, experiment with new algorithms, or to learn new skills and techniques. AWS Deep Learning Containers AWS Deep Learning Containers (AWS DL Containers) are Docker images pre-installed with deep learning frameworks to make it easy to deploy custom machine learning (ML) environments quickly by letting you skip the complicated process of building and optimizing your environments from scratch. Amazon SageMaker notebook An Amazon SageMaker notebook instance is a fully managed ML compute instance running the Jupyter Notebook App. Amazon SageMaker manages creating the instance and related resources. Use Jupyter notebooks in your notebook instance to prepare and process data, write code to train models, deploy models to Amazon SageMaker hosting, and test or validate your models. Fully-managed Hosting with Auto Scaling Amazon SageMaker Built-in Algorithms Amazon SageMaker provides several built-in machine learning algorithms that you can use for a variety of problem types. Amazon SageMaker RL Amazon SageMaker RL supports reinforcement learning in addition to traditional supervised and unsupervised learning. SageMaker now has built-in, fully-managed reinforcement learning algorithms Amazon Training Job With a Training Job you can train a model in Amazon SageMaker Amazon SageMaker automatic model tuning Amazon SageMaker automatic model tuning, also known as hyperparameter tuning, finds the best version of a model by running many training jobs on your dataset using the algorithm and ranges of hyperparameters that you specify. It then chooses the hyperparameter values that result in a model that performs the best, as measured by a metric that you choose. Amazon SageMaker Neo Amazon SageMaker Neo enables developers to train machine learning models once and run them anywhere in the cloud and at the edge. Amazon SageMaker Neo optimizes models to run up to twice as fast, with less than a tenth of the memory footprint, with no loss in accuracy. HTTPS endpoint Amazon SageMaker provides an HTTPS endpoint where your machine learning model is available to provide inferences. Batch Transform Use Batch Transform to get inferences for an entire dataset Inference Pipeline You use an inference pipeline to define and deploy any combination of pretrained Amazon SageMaker built-in algorithms and your own custom algorithms packaged in Docker containers AI Predefined Untrained Model Services Amazon has decided to offer only Pre-trained Model Services. If you want an untrained model you need to work with Amazon SageMaker to build, train, and deploy machine learning models. Maybe it could be a good idea having some models already built and just training for your specific needs, but AWS has bet to trained systems. AI Predefined Trained Model Services Amazon offers a set of Pre- trained Model Services that is available through an API o Javascript library ready to use without any training. The Amazon AI Pre-trained Model Services can be grouped in: Sight Amazon Rekognition: Identify objects, people, text, scenes, and activities, as well as detect any inappropriate content of any image or video. Language Amazon Comprehend: Natural language processing to extract insights and relationships from unstructured text. Amazon Textract: Extracts text and data from scanned documents. Amazon Translate: translate texts into more than 25 languages. Conversation Amazon Lex: build conversational agents to improve customer service and increase contact center efficiency. Amazon Polly: Turn text into lifelike speech to give voice to your applications. Amazon Transcribe: convert audio to text. Easily add high-quality speech-to-text capabilities to your applications and workflows. Data Forecast Amazon Personalize: Combine user interaction data with contextual data to generate high-quality recommendations. Amazon Forecast: Accurate time-series forecasting service. AI Solutions Amazon has defined the Amazon ML Solutions Lab that pairs clients team with Amazon machine learning experts to prepare data, build and train models, and put models into production. Amazon has also defined 2 learning tools: AWS DeepRacer AWS DeepRacer is a fully autonomous 1/18th-scale race car designed to help you learn about reinforcement learning through autonomous driving. AWS DeepLens AWS DeepLens is the world’s first deep learning-enabled video camera for developers. Integrated with Amazon SageMaker and many other AWS services, it allows you to get started with deep learning in less than 10 minutes through sample projects with practical, hands-on examples. Finally, Machine Learning in AWS Marketplace offers four kinds of solutions: Algorithms & Models – NEW Data Solutions Machine Learning Solutions Intelligent Solutions Google AI Services AI Hardware Accelerators Google offers two kinds of HW accelerators: GPUs (Graphical Processing Units) NVIDIA Tesla K80 (up to 8 GPUs), P100 (up to 4 GPUs), P4 (up to 4 GPUs), T4 (up to 4 GPUs), and V100 (up to 8 GPUs) GPUs TPUs (Tensor Processing Units) Currently, there are four Cloud TPU configurations: v2 single device Up to 180 teraflops and 64 GB High Bandwidth Memory (HBM) v2 Pod (Beta) (a collection of TPU devices connected together with high-speed interfaces) Up to 11.5 petaflops 4 TB HBM v3 single device Up to 420 teraflops and 128 GB HBM v3 Pod (Beta) ) (a collection of TPU devices connected together with high-speed interfaces) Up to 100+ petaflops and 32 TB HBM Google recommend the following guidelines: GPUs Models that are not written in TensorFlow or cannot be written in TensorFlow Models for which source does not exist or is too onerous to change Models with a significant number of custom TensorFlow operations that must run at least partially on CPUs Models with TensorFlow ops that are not available on Cloud TPU (see the list of available TensorFlow ops) Medium-to-large models with larger effective batch sizes TPUs Models dominated by matrix computations Models with no custom TensorFlow operations inside the main training loop Models that train for weeks or months Larger and very large models with very large effective batch sizes AI Frameworks Google manage or integrate the following frameworks out of the box: TensorFlow PyTorch Scikit-learn Keras XGBoost (an optimized distributed gradient boosting library ) Kubeflow (to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable) JupyterLab AI Machine Learning Foundation Platform Google offers the “AI Platform” to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data. The overall picture of Google Machine Learning end to end development cycle is the following Prepare Data (cover by the Data Products of Google) Ingestions; Cloud Storage & Transfer Service Data preparation & preprocess; Cloud Dataprep, Cloud Dataflow, Cloud Dataproc, BigQuery Build and define a Model (cover by AI platform) Data Labeling Service Deep Learning VM Image AI Platform Notebooks Training, Test & Analyze (cover by AI platform) AI Platform Training Kubeflow TFX Tools Deployment & Predict (cover by AI platform) AI Platform Prediction Kubeflow In addition Google offers Cloud Datalab a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models on Google Cloud Platform. Let’s see the main Google AI platform products & services: Data Labeling Service (beta) The Data Labeling Service enables you to submit the representative samples to human labelers who annotate them with the “right answers” and return the dataset in a format suitable for training a machine learning model. Deep Learning VM Image AI Platform Deep Learning VM Image makes it easy and fast to instantiate a VM image containing the most popular deep learning and machine learning frameworks on a Google Compute Engine instance. You can launch Compute Engine instances pre-installed with popular ML frameworks like TensorFlow, PyTorch, or scikit-learn AI Platform Notebooks AI Platform Notebooks is a managed service that offers an integrated JupyterLab environment in which machine learning developers and data scientists can create instances running JupyterLab that come pre-installed with the latest data science and machine learning frameworks in a single click. Notebooks is integrated with BigQuery, Cloud Dataproc, and Cloud Dataflow, making it easy to go from data ingestion to preprocessing and exploration, and eventually model training and deployment. AI Platform Training AI Platform runs your training job on computing resources in the cloud. You can train a built-in algorithm (beta) against your dataset without writing a training application. If built-in algorithms do not fit your use case, you can create a training application to run on AI Platform. AI Platform Prediction The AI Platform prediction service manages computing resources in the cloud to run your models. You can request predictions from your models and get predicted target values for them. Kubeflow & Kubeflow pipelines The Kubeflow project is dedicated to making deployments of machine learning (ML) workflows on Kubernetes simple, portable and scalable. Kubeflow Pipelines are a new component of Kubeflow that can help you compose, deploy, and manage end-to-end (optionally hybrid) machine learning workflows. TFX Tools TensorFlow Extended (TFX) is an end-to-end platform for deploying production ML pipelines AI Predefined Untrained Model Services Google offers Cloud AutoML as a suite of machine learning products that enables developers with limited machine learning expertise to train high-quality models specific to their business needs. Google AI Predefined Untrained Model Services can be grouped in: Sight AutoML Vision: Derive insights from images in the cloud or at the edge. AutoML Video Intelligence: Enable powerful content discovery and engaging video experiences. Language AutoML Natural Language: Reveal the structure and meaning of text through machine learning. AutoML Translation: Dynamically detect and translate between languages. Data Forecast (provide predictive insight of structure data) AutoML Tables: Automatically build and deploy state-of-the-art machine learning models on structured data. AI Predefined Trained Model Services Google offer a set of Predefined Trained Model Services that is available through a REST and RPC APIs ready to use without any training. The Google AI Predefined Trained Model Services can be grouped in: Sight Vision API: classify imagens and detect faces, printed and handwritten text, places and more. Video Intelligence API: recognize a vast number of objects, places, and actions in stored and streaming video. Language Natural Language API: Natural language understanding features including sentiment analysis, entity analysis, entity sentiment analysis, content classification, and syntax analysis. Translation API: translate texts into more than one hundred languages. Conversation Dialogflow: an end-to-end development suite for creating conversational interfaces for websites, mobile applications, popular messaging platforms, and IoT devices. Cloud Text-to-Speech: converts text into human-like speech in more than 100 voices across 20+ languages and variants. Cloud Speech-to-Text: convert audio to text. The API recognizes 120 languages and variants to support your global user base. Data Forecast Recommendations AI: Deliver highly personalized product recommendations at scale. Cloud Inference API: Quickly run large-scale correlations over typed time-series datasets. AI Solutions Google offers the following Prepackaged solutions: Contact Center AI This AI model works with your existing telephony and customer support technology and can easily be trained to engage with customers via speech or text, freeing your reps to handle more complex issues and provide a better customer experience. Cloud Talent Solution Cloud Talent solution will power company’s job and profile search, matching job seekers to open positions more accurately, and includes APIs that help talent technology providers and enterprise hiring companies attract great candidates and make the right hires. Document Understanding AI To help your organization efficiently analyze documents. By automatically classifying, extracting, and enriching this information, Document Understanding AI can save time and resources while unlocking insights that improve your decision-making. Google also offer a Google Cloud’s AI Hub that is a hosted repository of plug-and-play AI components, including end-to-end AI pipelines and out-of-the-box algorithms Azure AI Services AI Hardware Accelerators Azure offers two kinds of HW accelerators: GPUs (Graphical Processing Units) NVIDIA Tesla V100 (up to 8 GPUs) , K80 (up to 4 GPUs), P100 (up to 4 GPUs), P40 (up to 4 GPUs) and M60 (up to 4 GPUs) GPUs FPGA (Field-Programmable Gate Array) FPGAs on Azure are based on Intel’s FPGA devices, which data scientists and developers use to accelerate real-time AI calculations. It is offer as a service for the following models: ResNet 50 ResNet 152 DenseNet-121 VGG-16 SSD-VGG AI Frameworks Azure manage or integrate the following frameworks out of the box: TensorFlow PyTorch Scikit-learn Apache MXNet ONNX AI Machine Learning Foundation Platform Azure offers Azure Machine Learning service that provides SDKs and services to quickly prep data, train, and deploy machine learning models based on open-source Python frameworks, such as PyTorch, TensorFlow, scikit-learn and MXNet. Azure also offers Machine Learning Studio that is a collaborative, drag-and-drop visual workspace where you can build, test, and deploy machine learning solutions without needing to write code. It uses prebuilt and preconfigured machine learning algorithms and data-handling modules as well as a proprietary compute platform. The comparison based on Azure documentation is the following: Clearly Azure has decided to move from a proprietary environment (Machine Learning Studio) to a more open and standard approach based on Phyton and Tensorflow (Azure Machine Learning service) The overall picture of Azure Machine Learning service is the following Prepare Data Ingestion; Azure Machine Learning datasets (preview) Data preparation & preprocess; azureml-datasets package Build, Training, Test & Deploy Models Azure Machine Learning SDK for Python Visual interface (preview) for Azure Machine Learning service Azure Machine Learning CLI Azure Machine Learning Notebooks Let’s see the main Azure Machine Learning Services: Azure Machine Learning datasets (preview) With managed datasets, you can: Easily access data during model training without reconnecting to underlying stores Ensure data consistency & reproducibility using the same pointer across experiments: notebooks, automated ml, pipelines, visual interface Share data & collaborate with other users Explore data & manage lifecycle of data snapshots & versions Compare data in training to production azureml-datasets package(under AzurevML SDK) The Dataset class is a foundational resource for exploring, transforming and managing data within Azure Machine Learning. You can explore your data with summary statistics, transform it using intelligent transforms. When you’re ready to use the data for training, you can save the Dataset to your AML workspace to get versioning and reproducibility capabilities. Azure Machine Learning SDK for Python Azure Machine Learning SDK for Python allows build and run machine learning workflows with the Azure Machine Learning service. The most important classes and packages in the SDK are: Workspace, the top-level resource for Azure Machine Learning service. It provides a centralized place to work with all the artifacts you create when you use Azure Machine Learning service. Experiment that represents a collection of trials (individual model runs) Run that represents a single trial of an experiment. Model is used for working with cloud representations of machine learning models. You can use model registration to store and version your models in the Azure cloud, in your workspace. ComputeTarget, RunConfiguration, and ScriptRunConfig for creating and managing compute targets, setting the type and size, and attach the compute target configuration, and to specify the path/file to the training script. AutoMLConfig to configure parameters for automated machine learning training. Image for packaging models into container images that include the runtime environment and dependencies. Webservice for creating and deploying web services for your models. ML pipelines to define that Defines reusable machine learning workflows that can be used as a template for your machine learning scenarios. Visual interface (preview) for Azure Machine Learning service The visual interface (preview) for Azure Machine Learning service enables you to prep data, train, test, deploy, manage, and track machine learning models without writing code. Azure Machine Learning CLI The Azure Machine Learning CLI is an extension to the Azure CLI, a cross-platform command-line interface for the Azure platform. This extension provides commands for working with the Azure Machine Learning service. It allows you to automate your machine learning activities. The following list provides some example actions that you can do with the CLI extension: Run experiments to create machine learning models Register machine learning models for customer usage Package, deploy, and track the lifecycle of your machine learning models Azure Machine Learning Notebooks repository The Azure Machine Learning Notebooks repository includes the latest Azure Machine Learning Python SDK samples. These Juypter notebooks are designed to help you explore the SDK and serve as models for your own machine learning projects. AI Predefined Untrained Model Services Azure offer a set of AI Predefined Untrained Model Services that requires some custom training to use it. Both trained and untrained services are grouped under “Azure Cognitive Services” Azure Predefined Untrained Model Services can be grouped in: Sight Custom Vision: To build, deploy and improve your own image classifiers. Conversation Speech Services (custom options): that is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. Language QnA Maker: that enables you to create a knowledge-base(KB) from your semi-structured content such as Frequently Asked Question (FAQ) URLs, product manuals, support documents and custom questions and answers. Custom Translator: an extension of the Translator Text API which allows you to build neural translation systems. AI Predefined Trained Model Services Azure offers a set of Pre- trained Model Services under the “Azure Cognitive Services” that is available through a REST API or SDK ready to use without any training. In addition Azure has annunciated that you can deploy Azure Cognitive services to the edge, on premises and in the cloud using containers. The Azure AI Pre-trained Model Services can be grouped in: Sight Computer Vision API: that analyzes images to detect and provide insights about their visual features and characteristics. Face API: that detects human faces in an image and returns the rectangle coordinates of their locations. Optionally, face detection can extract a series of face-related attributes. Examples are head pose, gender, age, emotion, facial hair, and glasses. Video Indexer: that enables you to extract the insights from your videos. Form Recogniser (preview): to identify and extract key/value pairs and table data from form documents. Ink Recogniser (preview): to analyze and recognize digital ink content. Unlike services that use Optical Character Recognition (OCR), the API requires digital ink stroke data as input. Digital ink strokes are time-ordered sets of 2D points (X,Y coordinates) that represent the motion of input tools such as digital pens or fingers. Content Moderator API: that checks text, image, and video content for material that is potentially offensive, risky, or otherwise undesirable. Language Language Understanding (LUIS): to a user’s conversational, natural language text to predict overall meaning, and pull out relevant, detailed information. Text Analytics API: that provides advanced natural language processing over raw text, and includes four main functions: sentiment analysis, key phrase extraction, language detection, and entity recognition. Translator Text: to translate text in near real-time. Immersive Reader (Preview): to improve reading comprehension for emerging readers, language learners, and people with learning differences such as dyslexia. Conversation Speech Services: that is the unification of speech-to-text, text-to-speech, and speech-translation into a single Azure subscription. Speaker Recognition: that provides the most advanced algorithms for speaker verification and speaker identification. Data Forecast Personalizer: to discover what action to rank highest in a context. Anomaly Detector API: That enables you to monitor and detect abnormalities in your time series data. Azure also adds Bing Search API as a part of Azure Cognitive Services, however both Amazon and Google does not consider his Search Services as a part of AI services, so I will not include in the AI services comparison. In fact in the Azure Bing Search API documentation there isn’t any mention to Machine Learning implementations. Azure also provide access to the Cognitive Services Labs with an early look at emerging Cognitive Services technologies. Early adopters who do not need market-ready technology can discover, try and provide feedback on new Cognitive Services technologies before they are generally available. Labs are not Azure services Finally Azure is in a process to migrate Statistical Parametric implementations to Neural Networks implementation to improve his cognitive services. AI Solutions Azure offers the following Solutions based on the Cognitive Services: Azure Bot Service Develop intelligent, enterprise-grade bots that let you maintain control of your data. Build any type of bot—from a Q&A bot to your own branded virtual assistant. Use a open-source SDK and tools to easily connect your bot across popular channels and devices. Give your bot the ability to speak, listen, and understand your users with native integration of Azure Cognitive Services. Voice-first virtual assistants (preview) Custom virtual assistants using Azure Speech Services to create natural, human-like conversational interfaces for their applications and experiences. Azure Databricks Azure Databricks is an Apache Spark-based analytics platform optimized for the Microsoft Azure cloud services platform and integrated with Azure Machine Learning service to quickly identify suitable algorithms and hyperparameters Azure also offers Azure AI Gallery portal to quickly build Azure AI Solutions from solution templates, reference architectures and design patterns. Make them your own with the included instructions or with a featured partner. Alibaba AI Services AI Hardware Accelerators Alibaba offers two kind of HW accelerators: GPUs (Graphical Processing Units) AMD FirePro S7150 (up to 4 GPUs), NVIDIA Tesla M40 (up to 2 GPUs), NVIDIA Tesla P100 (up to 8 GPUs), NVIDIA Tesla P4 (up to 2 GPUs), and NVIDIA Tesla V100 (up to 8 GPUs). FPGA (Field-Programmable Gate Array) Intel ARRIA 10 GX 1150 FPGA up to 2 FGPAs (with 56 vCPUs) and 224 GIB memory Xilinx 16nm Virtex UltraScale + VU9P up to 4 FGPAs (with 64 vCPUs) and 256 GIB memory AI Frameworks Alibaba support the following frameworks (beta testing): TensorFlow Apache MXNet Caffe AI Machine Learning Foundation Platform Alibaba Cloud Machine Learning Platform for AI (PAI) provides an all-in-one machine learning service featuring low user technical skills requirements, but with high performance results. On the Machine Learning Platform for AI, you can quickly establish and deploy machine learning experiments to achieve seamless integration between algorithms and your business. Machine Learning Platform for AI is built on the full-fledged algorithm application system of Alibaba Group, and is now serving tens of thousands of developers and enterprise users. You can quickly build services such as product recommendation, financial risk control, image identification, and voice recognition based on Machine Learning Platform for AI to implement artificial intelligence. The Alibaba Cloud ML Platform is a proprietary platform that has the following features: User interface: You can quickly and easily build machine learning experiments using the drag-and-pull method and the computing results of the entire machine learning process can be visually displayed. Rich algorithm components: Provides more than 100 algorithm modules for regression, classification, clustering, text analysis, relationship mining, and many other models. Supports preprocessing tools and software, feature engineering, analysis systems, application areas, common machine learning algorithms, financial algorithms. All-in-one service: Provides a comprehensive service experience by helping users implement data cleansing, feature engineering, machine learning algorithms, evaluation, online prediction, and offline scheduling on the same platform. However, currently, the deep learning feature is in beta testing. Three deep learning frameworks: TensorFlow, Caffe, and MXNet are supported. For ingestion and data processing Alibaba recommends DataWorks (beta) that provides a full solution for data aggregation, data processing, data governance, and date services. The involved features include data integration, data development, data quality, data protection, and data services. AI Predefined Untrained Model Services Alibaba as AWS has decided to offer only pre-trained AI Services. If you want an untrained model you need to work with Alibaba Cloud Machine Learning Platform for AI to Build, train, and deploy machine learning models. AI Predefined Trained Model Services Alibaba offers a very limit set of Pre- trained Model Services that is available through an API and SDK libraries ready to use without any training. The Alibaba AI Pre-trained Model Services can be grouped in: Sight Alibaba Image Search: to obtain information about products that are similar or identical to the product in your input image and find images containing subjects or elements that are similar or identical to your input image. Language Intelligent Robot (Beta): A dialogue platform that enables smart dialog (based on natural language processing) through a range of dialogue-enabling clients Alibaba Machine Translation: to provide e-commerce machine translation engine and a multi-language technological infrastructure for enterprises multi-language services. AI Solutions Alibaba offers the following AI solutions as a starting point: Product Recommendation Discover the features that influence shopping behaviour and provide customer recommendations that increase product sales. Financial Risk Management Calculate the capability of clients to settle their credit card debt. Risk indexes help financial institutions manage risks effectively. News Classification Text analysis components automatically classify documents in a short period of time. Recently Alibaba has launched ET Brain an ultra-intelligent AI Platform for solving complex business and social problems: ET City Brain Utilizing comprehensive real-time city data, ET City Brain holistically optimizes urban public resources by instantly correcting defects in urban operations. ET Industrial Brain Empowering smart manufacturing with data and machine intelligence ET Medical Brain Alibaba Cloud is committed to apply data intelligence to help doctors and nurses offer better healthcare services to patients and ultimately save more lives. ET Environment Brain Data-driven green development for a smart and ecological civilization AI HW Accelerators Comparison AI Framework Comparison Built-in Algorithms Comparison AI Machine Learning Foundation Platform Comparison AI Predefined Untrained Model Services Comparison AI Predefined Trained Model Services Comparison AI Solutions Comparison Conclusion AI services are the most promising cloud services and all cloud providers are focused on growing their offer. Broadly speaking, the four main public cloud providers follow a similar approach, dividing their services into four layers: AI Solutions AI Predefined Model Services (trained or untrained) AI Machine Learning Foundation Platform based on well know AI Frameworks AI Hardware Accelerators Let’s see the strategies and approaches of each provider in each layer from bottom to top: AI Hardware Accelerators All providers (Amazon, Google, Azure and Alibaba) offer GPUs to accelerate the learning process of AI models. Working with a GPU means moving all the data to the GPU and then processing it, they’re good for doing high-latency computation in batches, but they consume a lot of power and its design was not originally developed for machine learning processes (although it is closer than traditional CPUs). In addition, GPUs was designed for graphics and high-performance computing systems where safety is not a requirement. All providers have realized that they need to go one step further in the field of HW accelerators to be able to offer more power and security with less consumption in the increasingly complex AI models. In this case there are two strategies: Google has opted to build a specialized processor (ASIC) for Tensorflow; TPU (Tensor Processing Unit). Google launched the first version of TPU in 2016 and they are already in the third generation. It is a very proven technology and has recently released Cloud TPU Pod that can include more than 1,000 individual TPU chips which are connected by an ultra-fast, two-dimensional toroidal mesh network. Amazon, Azure and Alibaba have opted for FPGA processors (Field-Programmable Gate Array) that are more flexible than TPUs but need a specific programming and are less efficient under Tensorflow tasks (compared to TPU) Google has made a very strong bet with its custom TPUs chips assuming that the Tensorflow framework is a standard accepted by all developers and suppliers, as indeed it seems. Amazon, Azure and Alibaba prefer a more conservative and flexible option that does not tie them to a specific framework using FPGA solutions, but in my opinion they are less powerful for models under Tensorflow and they are 1-2 years behind the google developments in TPU . However, like any single-source solution, TPUs can be overcome by the next generations of FPGAs developed by Intel (Altera) and Xilinx. In addition, Amazon and Azure are working to render FPGA’s more accessible and easier to program and use. AI Machine Learning Foundation Platform based on well know AI Frameworks In this case, the strategy of the four providers is clear: Leverage on standard AI frameworks (most Open source) where Tensorflow is currently the winner framework. Define a platform that simplifies and accelerates the deployment of Machine Learning solutions covering the tasks of: Data Preparation Build and Training a ML Model Deployment and Predict Leverage on standard AI frameworks Tensorflow (Open source library for Machine Learning and high-performance numerical computation) and PyTorch (Open Source Machine Learning library) are the frameworks supported by the four Cloud providers. It also seems clear that Python is the reference language in all AI frameworks. MXNet is the alternative to Tensorflow supported by Amazon, Azure and Alibaba (together with Tensorflow), but even if MXNet with its high-performance imperative API could overcome Tensorflow, Google with its unique bet on Tensorflow makes the balance tilt. Scikit-learn (provides a wide selection of supervised and unsupervised learning algorithms) and Keras (high-level API to build and train deep learning models) are the following most used frameworks. Amazon is the provider that supports a greater variety of frameworks. Google and Azure focus on the frameworks required for each tasks and finally Alibaba started with a proprietary approach and is evolving (currently in beta mode) to a framework-based approach (Tensorflow, MXNet and Caffe) Define a platform that simplifies and accelerates the deployment of Machine Learning solutions Here the approach is different in each provider and is one of the areas clearly with potential for improvement. Amazon offers Amazon SageMaker a set of tools to build, train, and deploys machine learning models fast that supports TensorFlow and Apache MXNet out-of-the-box Google launched the AI Platform to train your machine learning models at scale, to host your trained model in the cloud, and to use your model to make predictions about new data based on Tensorflow, PyTorch and Scikit-learn Azure offers Azure Machine Learning service that provides SDKs and services to quickly prep data, train, and deploy machine learning models based on open-source Python frameworks, such as PyTorch, Scikit-learn and MXNet Alibaba offers Alibaba Cloud Machine Learning Platform for AI (PAI) that provides an all-in-one machine learning service featuring low user technical skills requirements. It is a proprietary environment that is evolving to TensorFlow, Caffe, and MXNet All providers claim to offer a fully managed service that covers the entire machine learning workflow to build, train, and deploy machine learning models quickly but with different approaches: Azure and Alibaba offers a visual interface for modeling, and Amazon and Google are more code oriented (more flexible but more complex). All providers use his storage service as an input for training & test , but Google offers more option for data preparation (Cloud Dataprep, Cloud Dataflow, Cloud Dataproc, BigQuery) and also offer Cloud Datalab a powerful interactive tool created to explore, analyze, transform and visualize data and build machine learning models Amazon Sagemaker and Alibaba ML platform don’t allow train/deploy the model out of the cloud, while Azure and Google allow train/deploy the model in the cloud or on premises (some features). All providers start to train and deploy ML Models in containers. Google started an open source project called Kubeflow that combines the best of TensorFlow and Kubernetes to train and deploy ML models in containers. All providers offer built-in Ml Algorithms that help you train models for a variety of use cases: Two-class classification Multi-class classification Regression Ranking Anomaly detection (Google offer a service instead) Clustering (under ML BigQuery in Google) Google offers a limited built-in Algorithms (only 3) compared to their peers, partly because they are already covering it in their services or products. JupyterLab (an interactive development environment for working with notebooks, code and data) become to be the ML and data science standard UI In general, these AI Platforms are very recent and are evolving rapidly to: Simplify to the maximum the process to train and deploy of ML models (and the lifecycle) incorporating DevOps capabilities Expand the available ML algorithms Define a more intelligible price structure Align with market standards Increase deployment/training flexibility Allow deployment/training at the edge AI Predefined Model Services (trained or untrained) All suppliers have decided to focus on increasing the AI Predefined Model Services that can be grouped in: Sight Vision (Image classification, OCR, Faces detection, explicit content, ..) Video (Video classification, OCR, track objects, explicit content, .. ) Language Natural Language (Reveal the structure and meaning of text, Sentiment analysis, Syntax analysis, …) Translation (translate between languages, language detection,…) Conversation Speech-to-Text Text-to-Speech Natural Conversation Data Forecast Time Series dataset correlations & abnormally detection Recommendations/personalization Data Warehouse ML With the following aspects to point out: Google offers, in addition of a full set of AI Predefined Trained Model Services, a set of untrained model services; Cloud AutoML (Sight, Language and Data forecast) to support more customization specific to the client business needs. Azure also offers this customization for some sight and Language services. Google allows deploy vision machine learning models at the edge. Azure allows deploy vision (Face recognition & OCR) and language (Key Phrase Extraction, Language Detection, Sentiment Analysis and Language Understanding ) machine learning models at the edge. AWS offers Greengrass to allow local compute, messaging, data caching, sync, and ML inference capabilities to edge devices (pseudo edge deployment that at the end requires a connection with AWS) The AI Predefined Model Services are very similar among Amazon, Azure and Google. The differences lies in the features of each service (see the comparisons tables). For instance Google recognizes 120 languages and variants, azure 60 and Amazon 25. However, if you count the features, Azure has the better score (mainly in Video) followed by Google and Amazon. Alibaba is clearly at the bottom of his peers. AI Solutions Related to AI Solutions the cloud providers are following two approaches: Create Prepackage AI Solutions to solve specific business needs. Create an AI Portal and Hub to share experiences and other AI components. At the moment, the Prepackage AI solutions are very limited and only Google and Azure are offering valuable (but limited) Solutions. The business areas that Prepackage AI Solutions should growth are: Intelligent Contact Center CRM Human Resource and Intelligent Recruitment Document digitalization Data Analytics Cybersecurity Sales/revenue forecasting Digital Marketing Personalization and recommendation Gaming Robotics & smart cars (with 5G) Financial Trading Healthcare Logistics and Delivery Summary In short, Google seems to be ahead in AI services partly because of its integration with data services and its experience in Tensorflow, although it has an improvement path to simplify its processes adding more visual and DevOps capabilities. Amazon and Azure are evolving very quickly, and will probably reach the level of Google in the coming years. In fact, Azure has the richest set of AI predefined Model services and a promising visual AI Platform. IBM with IBM Watson Machine Learning, IBM Watson Studio and IBM Watson services has a robust offer of AI close to Amazon and Azure with a power visual interface. The key question is Will IBM be able to follow the rapid evolution of Amazon, Google and Azure AI Services? Alibaba is at the bottom and has a long way to reach his peers except for some specific ecommerce needs. The great differentiation will come when the supply of the upper layers increases: AI Solutions and Predefined AI Models, so the gap between sparse Data/AI scientists and functional teams is reduced.... Read more...Public Cloud Integration Services ReviewThe objective of this post is to analyze the Integration Services to connect the IT components like Front to Back or Back to Back Systems offered by the four main providers of public cloud; AWS, GCP, Azure and Alibaba. In addition, the post will identify the use cases for each kind of Integration Service taking into account factors like latency, throughput, guaranty of delivery and more. Table of Contents Integration ServicesAPI Gateway or Web APIAPI ManagementMessage QueuingTypes of message queuingEvent EngineOrchestrationData StreamingReal Time Data ReplicationIntegration Services Use Cases & RecommendationsProsConsUse CasesAPI GatewayAPI ManagementMessage QueuingEvent EngineOrchestrationData StreamingReal Time Data ReplicationAWS Integration ServicesAPI Gateway and API ManagementMessage QueueingEvent EngineOrchestrationWorkflowBatchData StreamingData ReplicationOthers; Data AccessAzure Integration ServicesAPI Gateway and API ManagementMessage QueueingEvent EngineOrchestrationWorkflowBatchData StreamingData ReplicationGoogle Integration ServicesAPI Gateway and API ManagementMessage Queueing and Event EngineOrchestrationWorkflowBatchData StreamingData ReplicationAlibaba Integration ServicesAPI Gateway and API ManagementMessage QueueingEvent EngineOrchestrationWorkflowBatchData StreamingData ReplicationAPI Gateway and API Management comparisonMessage queuing comparisonEvent Engine comparisonWorkflow comparisonBatch comparisonConclusion Integration Services Integration Services are the services that allow the communication and orchestration among the IT components of one application. In a Cloud environment, those services are grouped under integration Platform as a Service (iPaaS) solutions. In general, AWS, GCP, Azure and Alibaba have structured his Integration Services as follows: API Gateway or Web API API Management Message Queuing Event Engine Orchestration Engine Data Streaming Real Time Data Replication API Gateway or Web API API Gateway routing requests from clients to services. API Gateway allows support for stateful (WebSocket) and stateless (REST) APIs to call any Web Services or Microservices. This is the basic communication that all the cloud providers offer. API Gateway is a server that is the single entry point into the system and it might have other capabilities such as authentication, monitoring, load balancing and caching. API Gateway can be used to call from one service to another; however under this case you need a low latency call especially when both services are in the same network (internal API-based microservice-to-microservice communication). In this situation the use of direct calls as gRPC (like Google offers) can improve the latency. A good Gateway service should incorporate this possibility for internal calls between services of the same system. API Management API Management is a full stack API Communication Platform more sophisticated and complete than an API Gateway. With an API Management you will have out of the box the following additional capabilities: Internal and External APIs management SLA and Quotas depending of the client Security Authentication Authorization Role‑based access control Rate Limit Load Balance and High Scalability Traffic Management Monitoring (technical and functional) Analytics and Reports Monetize APIs Provide full life cycle of API development: Publish Portal for definition and publication Developer Portal for onboarding API Management is especially useful when it is required to expose your services to the outside world or market. Some API Gateways implementations are very close to API Management so there is an overlap that generates confusion between API Gateway and API Management. Message Queuing Message Queuing allows the communication between IT Components in a loosely coupled way via asynchronous messaging. This service provides queues that hold messages until they can be picked up by the receiver. This lets applications and integration software communicate asynchronously, even across diverse technology platforms and protocols. The concept of Message Queuing was born out of the need to move away from point-to-point synchronous integration, which becomes couple applications and does not scale because it creates rigid dependencies between applications. Types of message queuing Point-to-point One message is placed on the queue and one application receives that message. Messages accumulate on queues until they are retrieved by programs that service those queues. In point-to-point messaging, a sending application must know information about the receiving application (the name of the queue to which to send the information) before it can send a message to that application. Publish/Subscribe A copy of each message published by a publishing application is delivered to every interested application. There might be many, one, or no interested applications. In publish/subscribe an interested application is known as a subscriber and the messages are queued on a queue identified by a subscription. The subject of the information is identified by its topic. Publish/subscribe messaging allows you to decouple the provider of information, from the consumers of that information. The sending application and receiving application do not need to know anything about each other for the information to be sent and received. Message Queuing will provide the following capabilities Route messages between Queues Data Transformation Event queuing and sequencing FIFO ordering Basic Orchestration of messages Monitor and control routing of message exchange between Queues Control deployment and versioning of messages formats Implement multiple physical protocols with protocol conversion Scaling Message Consistency (transaction management) Security Event Engine In a message queuing approach the receiver of the message require to pull for new messages. Under the Even Engine approach the receiver registers an event handler for the event source it’s interested in. The Event Engine then invokes that event handler when the specified event occurs (Push vs Pull in message queuing) So the Event Engine is an implementation of the publish-subscribe model which subscriber services automatically perform work (push) in response to events triggered by publisher service. More sophisticate Event Engines support Rules and Complex Event processing (CEP) that allows correlate events to identify patterns and takes specific actions all in real time. In general, the event Engine don’t guaranty FIFO and occasionally duplicate messages at the subscriber end could happen. In addition, all the Cloud Providers have a set of source events associate to his cloud ecosystem of services. This event can start another process or cloud service like a function. Orchestration The Orchestration allows automating a set of activities to perform complex process. There are four kind of Orchestration Software: Configuration Management (**to be covered in the Management post) For maintaining computer Systems and Software in a known, consistent state. It is based on a scripting language like chef, puppet, and ansible. Batch Processing Batch process jobs that can run without any end-user interaction and can be scheduled to start up at specific time or whenever a condition occur. The jobs are processed in parallel over a pool of compute nodes. Workflow Management (or BPM) That define a workflow process model and execute instances of the model to generate tasks. A process model consists of a series of tasks and events from the start of the process to its termination points. The tasks could be assigned to a user or a group of users or automatize by a service and also flagged with a due date or start date. In addition, you also have a Report in real time of the status and KPI’s of all the instances of Workflow Case Management Case Management is an evolution of Workflow Management where interactions between people, process, data, and content can be dynamic, ad hoc, and unpredictable against the Workflow Management model that is perfect for linear processes. Data Streaming Data streaming is the process of sending data records continuously to a Data Lake, Storage o Database System. This may include a wide variety of data sources such as telemetry from connected devices, log files, e-commerce transactions, or information from social networks. It is usually an event engine specialized in IoT or data transformation ready for a high flow of information. Real Time Data Replication Real Time Data Replication is the process of replicate the data as soon as it changes from a source to another Site or Database System. Real Time Data Replication is useful in improving the availability data and performance. It can be use also to move legacy Data to a new Cloud Data Model during the coexistence of both platforms. The Cloud providers offer real time replication for their databases in his cloud. For Databases out of the Cloud each Cloud provider offers a partial solution with only a subset of Databases. On the other side, companies like Attunity (https://www.attunity.com/solutions/database/enterprise-data-replication/ ) offers a better portfolio of Database real time Replication but it is not managed by the Cloud Provider (at the moment). However, for Cloud Storage , almost all the cloud providers offers a solution for real time synchronization with on premise or other sites. In the post I will only analyze External Real Time Data Replication (with third party). Integration Services Use Cases & Recommendations AWS Integration Services API Gateway and API Management Amazon offers API Amazon API Gateway for both API Gateway and API Management. The reality is that Amazon API Gateway was born as API Gateway and then Amazon has added API Management features, but it still does not have the API Management category. Capabilities like Onboarding Portal, API Monetization and Real Time Analytics are missing. Furthermore, it is missing the possibility of invoking internal services within the same network with low latency (such as gRPC) Message Queueing Amazon offers two Message Queueing Implementations: Amazon SQS; a cloud native fully managed message queuing implementation Amazon MQ: a managed message broker service for Apache ActiveMQ that simplify the connection of your current applications to Amazon MQ. Amazon SQS manages two types of queues: Standard queues that provide at-least-once delivery. FIFO queues provide exactly-once processing and the order in which messages are sent and received is strictly preserved. Amazon MQ provides compatibility with many popular message brokers perfect for migrating applications from existing message brokers that rely on compatibility with APIs such as JMS or protocols such as AMQP, MQTT, OpenWire, and STOMP. Event Engine Amazon Simple Notification Service (SNS) is the serverles service for Events. Amazon SNS can filter (a subscription filter policy) and fanout (replicated and pushed to multiple endpoints) events to the following destinations to support event-driven computing use cases: Amazon Simple Queue Service AWS Lambda Mobile Push Webhook (HTTP/S) Email or SMS And almost all the internal AWS services can work as a publisher. In addition, Amazon offers specialized event service for telemetry called AWS IoT Events that is a fully managed IoT service that makes it easy to detect and respond to events from IoT sensors and applications. Orchestration Workflow Amazon offers two solutions for Workflow Orchestration: AWS Step Functions allows coordinate multiple AWS services into serverless workflows. You define state machines (in JSON) that describe your workflow as a series of steps, their relationships, and their inputs and outputs. Amazon SWF helps developers build, run, and scale background jobs that have parallel or sequential steps. In this case you need to code the logic. As Amazon said; “You should consider using AWS Step Functions for all your new applications, since it provides a more productive and agile approach to coordinating application components using visual workflows. If you require external signals to intervene in your processes, or you would like to launch child processes that return a result to a parent, then you should consider Amazon Simple Workflow Service (Amazon SWF). With Amazon SWF, instead of writing state machines in declarative JSON, you write a decider program to separate activity steps from decision steps. This provides you complete control over your orchestration logic, but increases the complexity of developing applications. You may write decider programs in the programming language of your choice, or you may use the Flow framework to use programming constructs that structure asynchronous interactions for you.” In addition Amazon offers a specialized workflow service; AWS Data Pipeline that helps you reliably process and move data between different AWS compute and storage services, as well as on-premises data sources, at specified intervals. Batch Amazon offers AWS Batch for batch processing at any scale. Data Streaming Amazon offers two solutions for Data Streaming to cover IoT and Data: AWS IoT Core is a managed cloud service that lets connected devices easily and securely interacts with cloud applications and other devices. AWS IoT Core can support billions of devices and trillions of messages, and can process and route those messages to AWS endpoints and to other devices reliably and securely. The Message Broker is a high throughput pub/sub message broker that securely transmits messages to and from all of your IoT devices and applications with low latency. Amazon Kinesis Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service. KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events. The data collected is available in milliseconds to enable real-time analytics use cases such as real-time dashboards, real-time anomaly detection, dynamic pricing, and more. Amazon Kinesis Data Firehose is the easiest way to reliably load streaming data into data lakes, data stores and analytics tools. It can capture, transform, and load streaming data into Amazon S3, Amazon Redshift, Amazon Elasticsearch Service, and Splunk. Data Replication Amazon offers two solutions for real time replication to cover Storage and Databases: AWS DataSync is a data transfer service that makes it easy for you to automate moving data between on-premises storage and Amazon S3 or Amazon Elastic File System (Amazon EFS). AWS Database Migration Service helps you migrate databases to AWS, but also you can use AWS Database Migration Service for continuous data replication. Continuous replication can be done from your data center to the databases in AWS or in the reverse, replicating to a database in your datacenter from a database in AWS. Ongoing continuous replication can also be done between homogeneous or heterogeneous databases. Supported databases for CDC are SQL Server, Oracle and MySQL. Others; Data Access Amazon in addition offers as an integration Service; AWS AppSync that simplifies application development by letting you create a flexible API to securely access, manipulate, and combine data from one or more data sources. Azure Integration Services API Gateway and API Management Azure unlike Amazon offers an API solution Management; Azure API Management for both API Gateway and API Management. Azure is focused on developers and adds capabilities such as policy management, basic onboarding and SOAP support, but miss other required capabilities like API Monetization and Real Time Analytics. Message Queueing Azure offers Azure Service Bus as a Message Queueing Implementation. In addition Azure has Azure Service Bus Relay for hybrid deployments (Connect your existing on-premises systems to cloud solutions) Azure, unlike Amazon, offers a single product for both point to point and publish/subscribe scenarios. Finally Azure Service Bus has more advanced features that Amazon that enable you to solve more complex messaging problems. However, the Azure implementation seems to be less serverless than Amazon. You need to Partitioning your traffic across multiple queues to improve throughput, but you can use Partitioned queues or Azure Auto Scale. MQTT is not supported in Azure Service Bus but it is covered in Azure IoT Hub Event Engine Azure Event Grid is the serverles service for Events. Azure Event Grid can filter (a subscription filter policy) and fanout (replicated and pushed to multiple endpoints) events to the following destinations to support event-driven computing use cases: Azure Automation Azure Functions Event Hubs Hybrid Connections Logic Apps Microsoft Flow Queue Storage WebHooks (HTTP/S) And almost all the internal Azure services (but less than Amazon) can work as a publisher. As Amazon, Azure offers specialized event service for telemetry under the Azure IoT Hub Service. Orchestration Workflow Azure Logic Apps is a cloud service that helps you automate and orchestrate tasks, business processes, and workflows when you need to integrate apps, data, systems, and services across enterprises or organizations. In addition there is another product; Microsoft Flow that is built on top of Logic Apps focus on SaaS and not require an Azure subscription to build flows. Azure Logic Apps focuses on Enterprise Integration. Finally there are a third alternative; WebJobs feature of App Service to run a script or code in the context of an App Service web app. Batch Azure Batch to run large-scale parallel and high-performance computing (HPC) batch jobs. Data Streaming Azure offers a single solution for Data Streaming to cover both IoT and Data: Event Hub that is a fully managed, real-time data ingestion service that’s simple, trusted, and scalable. Stream millions of events per second from any source to build dynamic data pipelines and immediately respond to business challenges. Data Replication Azure offers multiple solutions for real time replication to cover Storage and Databases: SQL Data Sync that is a service built on Azure SQL Database that lets you synchronize the data you select bi-directionally across multiple SQL databases and SQL Server instances. Storage and files ongoing data ingestion Azure Data Factory – Data Factory should be used to scale out a transfer operation, and if there is a need for orchestration and enterprise grade monitoring capabilities. Use Azure Data Factory to set up a cloud pipeline that regularly transfers files between several Azure services, on-premises, or a combination of the two. Azure Data Factory lets you orchestrate data-driven workflows that ingest data from disparate data stores and automate data movement and data transformation. Azure Data Box family for online transfers – Data Box Edge and Data Box Gateway are online network devices that can move data into and out of Azure. Data Box Edge uses artificial intelligence (AI)-enabled Edge compute to pre-process data before upload. Data Box Gateway is a virtual version of the device with the same data transfer capabilities. Azure File Sync replicates files from your on-premises Windows Server to an Azure file share. Unlike amazon, azure does not support continuous replication in his Azure Database Migration Service. Google Integration Services API Gateway and API Management Google offers the best API management solution based on Apigee with all the capabilities required by a good API Manager; however the main drawback is the price of Apigee platform that up to date follows legacy market models (no pay per use model). The good news is that unlike Azure, Google offers a cheaper alternative for API Gateway solutions; Cloud Endpoint that also supports gRPC for internal calls (mainly inter containers and microservices calls) with low latency. Message Queueing and Event Engine Google offers a single solution for both Message Queueing and Events; Cloud Pub/Sub supporting Point to Point and Publish/Subscribe under pull and push approach. Cloud Pub/Sub provides a very scalable environment (Serverless) with low latency but it has a great disadvantage; does not support FIFO and occasional duplicates are to be expected. As google says; “Cloud Pub/Sub serves as a foundation for modern stream analytics pipelines”. In this case the aforementioned drawback is not an issue, but in other use cases you need to receive the messages in the right order (for instance any financial transaction). Google offers advices to support the ordering of messages by following some design patterns, but in my opinion it should be an optional capability of the service (like Amazon or Azure) that will have less throughput. As Amazon and Azure, Google offers specialized event service for telemetry under the Cloud IoT Core Service. It is surprising that Google has made such a strong investment with Apigee to cover the Api Management solution and the message queuing option does not support FIFO as an option. Orchestration Workflow Google offers Cloud Composer as a fully managed workflow orchestration service built on Apache Airflow open source project and operated using the Python programming language. Batch Up to date Google does not have a Batch solution. However for data transformation google offers Cloud Dataflow for transforming and enriching data in stream (real time) and batch modes. Data Streaming Google offers Cloud Dataflow as a fully-managed service for transforming and enriching data in stream (real time) and batch. Cloud Dataflow supports fast, simplified pipeline development via expressive SQL, Java, and Python APIs in the Apache Beam SDK. Data Replication Google offers Cloud Data Transfer Service for data transfer needs and get your data on the cloud quickly and securely. However, Cloud Data Transfer does not support continuous data replication from/to external database. If you want real time replication you should use an external solution like Attunity Replicate that supports data replication, ingest and streaming across a wide range of heterogeneous databases, data warehouses and Big Data platforms. In fact, Attunity are doing a relevant campaign on Google platform. Alibaba Integration Services API Gateway and API Management Alibaba as Amazon offers only an API Gateway; Alibaba API Gateway. Alibaba unlike Amazon don’t publish his API Gateway as an API Management solution but it is more than basic API Gateway covering API lifecycle management services, including API publishing, management and maintenance. Message Queueing Alibaba offers two Message Queueing Implementations: Message Queue is a distributed message queue service based on RocketMQ that supports reliable message-based asynchronous communication among microservices, distributed systems, and serverless applications. Message Service is a message queuing and notification service that facilitates smooth transfer of messages between applications Both are very similar. Message Queue is more mature (used at the largest e-commerce site in Alibaba implementations) and have better performance but Message Service seems to be a more modern service including event/notification supporting both Point to Point and Pub/Sub with push capabilities. Therefore Message Service is the solution that up to day is promoting Alibaba Event Engine Alibaba Message Service works also as an Event Engine supporting Pub/Sub with push capabilities. Orchestration Workflow Up to date Alibaba does not have a Workflow Service Batch Alibaba offers Batch Compute to supports massive concurrent jobs. Data Streaming Alibaba Offers Realtime Compute a one-stop, high-performance platform that enables real-time big data processing based on Apache Flink. It is widely used in diverse scenarios, such as streaming data processing, offline data processing, and data lake computing. Data Replication Alibaba offers a solutions for real time replication to cover Storage: Cloud Storage Gateway (beta version) that is a gateway service that can be deployed on-premises data center or in the cloud. It uses Alibaba Cloud OSS as the backend storage and supports industry-standard storage protocols (NFS/SMB/iSCSI) and provides low latency performance by caching frequently accessed data locally. Cloud Storage Gateway supports File and Block Gateway. However, Data Transmission Service (DTS) to migrate data between data storage types, such as relational database, NoSQL, and OLAP supports continuous data replication but only among Alibaba RDS instances. API Gateway and API Management comparison Message queuing comparison Event Engine comparison Workflow comparison Batch comparison Conclusion All cloud providers of the analysis offers Integration services in their portfolio, but with some differences and scope: AWS offers the most consistent integration services covering all the blueprint of integration architecture and even adding a service to access aggregate data. However, AWS should improve his API Manager service so that it really covers all the expected capabilities as Onboarding Portal, API Monetization and Real Time Analytics. In addition AWS should incorporate low latency calls (such as gRPC) for internal services calls. Azure, thanks to Microsoft’s experience in the implementation of corporate architectures in large companies, offers a more integrated approach focused on the developer and hybrid deployments. Azure also offers the most complete external real time replication solution. However, the API Manager solution is also not complete (missing API Monetization and Real Time Analytics) and some services are not serverless because they were not originally born in the cloud. Google offers a combination of very good products and ideas with incomprehensible shortcomings. The good part is: The purchase of Apigge, the best market solution of API Management The possibility of low latency calls based on gRPC Serverless approach in all their services Commitment to Open Source implementations However, other decisions are not understood, such as: Offer a legacy contract model for the use of Apigee Don’t support FIFO in the Message Queuing service Don’t offer a general purpose Batch service Don’t offer external replication services in real time (although it has a very good relationship with Attunity) Alibaba tries to follow the trail of AWS, but very far from the maturity of the AWS integration services. It has shortcomings in the API Manager, Workflow and external replication services in real time. In all the cloud services analyzed, advanced integration functions are missing, such as: Complex event processing (CEP) support in the Message Queue service Case Management Service that are currently replacing traditional Workflows Business activity monitoring (BAM) services Advanced real-time data replication service as provided by Attunity Full API Manager support (like Google’s Apigee but with the option of payment for use) Offer low latency alternatives for the call of microservices within the same network Offer a general purpose Rule Engine service... Read more...Public Cloud Storage and Database Review (May 2020 update)The objective of this post is to analyze the Database and Storage Services including in memory cache offered by the four main providers of public cloud; AWS, GCP, Azure and Alibaba. In addition, the post will identify the use cases for each kind of Database and Storage taking into account factors like latency, consistency, storage capacity and Api Access. Table of Contents Database and Storage ServicesRelational DatabasesNon Relational DatabasesObject StorageBlock StorageFile StorageIn Memory StorageHybrid Cloud StorageAnalyticsDatabase and Storage Services Use Cases & RecommendationsIn MemoryObject StorageBlock storageFile StorageRelational DatabaseRelational DatabaseKey-Value DatabaseDocument DatabaseColumnar DatabaseGraph DatabaseGCP Database and Storage ServicesAWS Database and Storage ServicesAzure Database and Storage ServicesAlibaba Database and Storage ServicesDatabase & Storage MigrationGoogle Database & Storage Migration guides and utilitiesAmazon Database & Storage Migration guides and utilitiesAzure Database & Storage Migration guides and utilitiesAlibaba Database & Storage Migration guides and utilitiesBonus – MongoDB AtlasConclusion Database and Storage Services Database and Storage Services are one of the key services offered by Public Cloud to save and access Data. Unfortunately the CAP theorem states that it is impossible for a distributed data store (as offers by the Cloud Vendors) to simultaneously provide more than two out of the following three capabilities: Consistency Availability Partition tolerance So, you are force to choose the Database and Storage Services that better fit the needs of your use cases (in fact, your main decision is to choose between Consistency (or Transactionality) or Availability because Partition Tolerance is a must in a cloud environment). There are other considerations like Latency, Storage Capacity, SLA, Multi Region support and Language access that we will review in the following chapters. In general, AWS, GCP, Azure and Alibaba have structured his Database and Storage Services as follows : Databases Relational Non Relational Storage Object Block File In Memory Cache Relational Databases A relational database is based on the relational model of data (as a collection of relations). The Relational Model is based on the idea that each table will include a primary key or identifier. Other tables use that identifier to provide “relational” data links and results. Most relational databases use the SQL data definition and query language. The Cloud Providers offers two kinds of Managed Relational Databases: Managed Market Relational Database like MySql, Postgres, SQL Server and MariaDB with limited scalability and size. The Cloud provider offer different level of managed database services that makes it easy to set up, maintain, manage, and administer the market databases in the cloud. Proprietary Cloud Provider Relational Database designed to scale in the Cloud Provider infrastructure. The Market Relational Database applies when you have a legacy applications based on a market database and you don’t want to modify the application code or in the case that you don’t want to have a lock-in with the Cloud provider option. In both situations you have to take into account the limited scalability offer by the cloud providers when you use a Market Database. The Proprietary Cloud Provider Relational Database, on the other hand, offers a better horizontal scalability with lower price and others capabilities like Multi Region replication. The main trick use by the Proprietary Cloud Provider Relational Database (and some market solutions) to support better horizontal scalability is the concept of replication. You have a semi-synchronous replica to create a Failover instance and an asynchronous replication to create multiple Read Only Instances. The scalability is actually focused on the read queries. If your use case requires multiple updates or inserts, it may be reasonable to change the data model. Besides the Relational Database managed by the cloud providers have a limit of 64TB of storage (100 TB for Azure SQL Database), so if you need to manage more storage you should chose a Non Relational Database (Key-value for instance). About the billing model you are charged for the following: The number of nodes and/or instances type. The amount of storage that your tables/indexes use and some vendor adds IOs (AWS). The amount of specific network bandwidth used (mainly egress traffic). Geo Replication. Finally there are two more options to deploy a Relational Database in the cloud: Unmanaged Market Relational Database like DB2, Oracle, and others with limited scalability and size. Where the Cloud Provider don’t have any responsibility of the management of the database Third party Managed Market Relational Database where a third party takes the responsibility of manage the database on the Cloud Provider infrastructure and guaranty the portability of the database among other Cloud Providers (MongoDB Atlas is a clear example) In the analysis, I will focus only on the Relational Databases managed by the Cloud Provider. Non Relational Databases Non Relational Databases can be categorized in four types: Key-Value database or Hash table where the records are stored and retrieved using a key that uniquely identifies the record, and is used to quickly find the data within the database. Document database is a subclass of key-Value that store all information for a given object (or document) in a single instance in the database, and every stored object can be different from every other (semi-structured data). In addition the Document database relies on internal structure in the document in order to extract metadata. Columnar database that stores data tables by column rather than by row. Columnar database is optimized for fast retrieval of columns of data, typically in analytical applications which involve highly complex queries over all data. Graph database that uses graph structures for semantic queries with nodes, edges and properties to represent and store data. In general, the Non Relational Databases offers better scalability, storage size and speed than Relational Databases at the cost of reducing the consistency (Remember the CAP theorem). In addition, the Cloud Providers has decided to develop a custom/proprietary implementation of Non Relational Databases or in some cases implement a very standard Open Source solution. They don’t offer a managed solution for third party Non SQL Databases. Finally, there are some common trends: There is some consensus to offer a Document Database with MongoDB compatibility (with the exception of Google). The Graph Database is emerging in 2019, and Google and Alibaba that don’t have a custom development are offering JanusGraph deployment waiting for a custom solution to be developed. Amazon and Alibaba have decided to create a specialized Timestream database for IoT events and operational applications. About the billing model you are charged for the following: The CPU required based on one of the following options (depending of the cloud provider and Database) The number of nodes or instances (AWS, Alibaba & Google). Read/write requests (AWS, Alibaba & Google). Request Units (combination of CPU, Memory and IOPs) (Azure). Data Scanned (Warehouse option). The amount of storage that your tables and some vendor add IOs (AWS). The amount of specific network bandwidth used (mainly egress traffic). Geo Replication and other additional functions like cache accelerators As in the Relational Database, there are two additional options to deploy a Non Relational Database in the cloud: Unmanaged Market Non Relational Database Third party Managed Market Non Relational Database where a MongoDB Atlas approach is a clear reference Object Storage Object storage manages data as objects. Each object typically includes the data itself, a variable amount of metadata, version and a globally unique identifier. In general a single object can be up to 5 TB in size. Object-storage systems allow retention of massive amounts of unstructured data. Object storage is used for purposes such as storing media content, backups, archive and integrated repository for analytics and Machine Learning. Objects can be organized in sublevels (Buckets or Containers). The Cloud Providers offers different storage classes with specific SLA and cost like: High Frequency Access. Multi-Regional where the objects are replicated on multiple regions to improve latency and availability. Regional. Low Frequency Access where the cost of the storage is lower if you access the data infrequently (less than 1 time per month for example). Lowest Frequency Access usually historical data storage for backups than don’t required to be access more than one time per year and has the cheaper storage cost. And also offer a Versioning & Life Cycle Management. About the billing model you are charged for the following: Storage depending of the class. Operation Usage (get, Put, create, Delete,…) also depending of the class. The amount of specific network bandwidth used (mainly egress traffic). Geo Replication and other additional functions. Block Storage Block Storage manages data as blocks within sectors and tracks. Block storage is data storage typically used in storage-area network (SAN) environments or attached to the VM where data is stored in volumes, also referred to as blocks. Each block is assigned an arbitrary identifier by which it can be stored and retrieved, but no metadata providing further context. File systems and databases are common uses for block storage because they require consistently high performance. About the billing model you are charged for the following: Volume Type (SSD, HD, Ultra SSD…). Storage. Snapshots. File Storage File Storage manage data as a file hierarchy as a fully managed Network Attached Storage (NAS). File storage provides a centralized, highly accessible location for files, and generally comes at a lower cost than block storage. File storage uses metadata and directories to organize files, which makes it a convenient option for an organization looking to simply store large amounts of data. The File Storage in general supports two protocols: SMB version 3.0 protocol for windows. NFS v3-4 for Linux and others. However Google and Amazon only implement NFS protocol, so windows machine can’t use his native protocol. About the billing model you are charged for the following: Storage (per Class if any). Data Transfer out. In Memory Storage Memory Storage is a Storage System that primarily relies on main memory for computer data storage. The main use of a Memory Storage is to have a very fast cache for read only data. The Cloud Providers deploy Memory Storage based on Redis and Memcache under a fully managed in-memory data store service. Redis is the main bet for all the Cloud Provider thanks to the advanced capabilities. About the billing model you are charged for the following: Service Tier (Cache node Type and nodes). Storage. Data Transfer out or inter AZ. Region of the service. Hybrid Cloud Storage Additionally cloud providers begin to provide hybrid storage solutions (internal or third party) for use cases like; moving tape backups to the cloud, reducing on-premises storage with cloud-backed file shares, providing low latency access to data for on-premises applications, as well as various migration, archiving, processing, and disaster recovery use cases. Analytics The Analytics platform ( Data Computing, Data visualization, Data Search and Analytics and Data development) could also be under the Storage and Database services, but because it has its own entity and there are more services than pure storage, it will be treated in a different chapter. Database and Storage Services Use Cases & Recommendations GCP Database and Storage Services GCP Enabling hybrid storage with a third party product called Egnyte. Egnyte, a Google Cloud Technology Partner and a 2016 Gartner Magic Quadrant Leader for Enterprise File Synchronization and Sharing (EFSS), offers Google customers the ability to create a hybrid SaaS file sync and share infrastructure that harnesses the power and flexibility of Google Cloud services with the security and centralized IT administration of on-premises content management. AWS Database and Storage Services Amazon also offers some additional services like: Amazon Timestream A fast, scalable, fully managed time series database service for IoT and operational applications that makes it easy to store and analyze trillions of events per day at 1/10th the cost of relational databases. Amazon Quantum Ledger Database (QLDB) That provides a high-performance, immutable, cryptographically verifiable ledger for applications where multiple parties work with a centralized, trusted authority to maintain a complete, verifiable record of transactions. Amazon Keyspaces (for Apache Cassandra) A scalable, highly available, and managed Apache Cassandra–compatible database service. Amazon DynamoDB Accelerator (DAX) Aa fully managed, highly available, in-memory cache for DynamoDB that delivers up to a 10x performance improvement – from milliseconds to microseconds – even at millions of requests per second. Amazon FSx for Lustre A fully managed file system that is optimized for compute-intensive workloads, such as high performance computing, machine learning, and media data processing workflows, and is seamlessly integrated with Amazon S3 Amazon FSx for Windows File Server A fully managed native Microsoft Windows file system built on Windows Server so you can easily move your Windows-based applications that require file storage to AWS, including full support for the SMB protocol and Windows NTFS, Active Directory (AD) integration, and Distributed File System (DFS). AWS Storage Gateway A hybrid cloud storage service that gives you on-premises access to virtually unlimited cloud storage. The service provides three different types of gateways – Tape Gateway, File Gateway, and Volume Gateway Azure Database and Storage Services Azure has the following additional database and storage services: Azure SQL Database Edge (preview) A smallfootprint, edge-optimized, and AI-capable data engine built on the same code base as Microsoft Azure SQL Database and Microsoft SQL Server. Currently in preview, SQL Database Edge simplifies IoT infrastructure by offering streaming, storage, and analytics all in one platform. Azure Storage Tables (replaced by Azure Cosmos DB Table API) A key-value database solution with rows and columns. Tables store data as a collection of entities where each entity has a property. Azure Tables can have up to 255 properties (columns in relational databases). The maximum entity size (row size in a relational database) is 1 MB. Azure Queue storage A service for storing large numbers of messages that can be accessed from anywhere in the world via authenticated calls using HTTP or HTTPS. A single queue message can be up to 64 KB in size, and a queue can contain millions of messages, up to the total capacity limit of a storage account. Azure Data Lake Storage A set of capabilities dedicated to big data analytics, built on Azure Blob storage Azure File Sync, HPC Cache & Data Box (Hybrid Storage) You can use Azure File Sync to centralize your organization’s file shares in Azure Files, while keeping the flexibility, performance, and compatibility of an on-premises file serve. Azure Data Box Gateway is a storage solution that enables you to seamlessly send data to Azure. Azure HPC Cache speeds access to your data for high-performance computing (HPC) tasks. This service can be used even for workflows where your data is stored across WAN links, such as in your local datacenter network-attached storage (NAS) environment. Avere vFXT for Azure & Azure FXT Edge Filer (hybrid storage cache) Avere vFXT for Azure is a filesystem caching solution for data-intensive high-performance computing (HPC) tasks. It lets you take advantage of cloud computing’s scalability to make your data accessible when and where it’s needed – even for data that’s stored in your own on-premises hardware. Azure FXT Edge Filer is a hybrid storage caching appliance that provides fast file access and active archive for high-performance computing (HPC) tasks. In addition, just to mention, that Azure under Cosmos DB is the only Cloud provider that offer a multi-model database service to three of the Non Relational Databases; Key-Value, Document and Graph with SQL Api access. Alibaba Database and Storage Services Alibaba also offers some additional services like: Time Series Database (TSDB) As a highly reliable and cost-effective time-series database that provides efficient data reading and writing capabilities, high-compression-ratio storage, and time-series data interpolation and aggregation. TSDB has wide industrial applications including Internet of Things (IoT) monitoring systems, enterprise-level energy management systems (EMS), production safety monitoring, and electric power detection systems. Cloud Storage Gateway A gateway service that you can deploy on the premises or in the cloud. This service provides a seamless and secure connection between your on-premises IT infrastructure and cloud-based storage services at the back end. Hybrid Cloud Storage Array (coming soon) A cost-effective, efficient and easy-to-manage hybrid cloud storage solution. Hybrid Backup Recovery An enterprise-level data backup and migration service that is secure, efficient, cost-effective, and scalable. Hybrid Backup Recovery provides full protection for data stored in Alibaba Cloud and on-premises data centers. Database & Storage Migration In general, there are two approaches to migrate your legacy Database to any of the Cloud DB Services: Migrating to the same type of database You can lift and shift your database to the Market managed database offered by the Cloud Provider You can lift and shift your database to the Market managed database offered by third party (MongoDB Atlas) You can lift and shift your database to the IaaS offered by the Cloud Provider Migrating to a new type of database offered by the Cloud Provider In addition, you have to review/modify the database calls in the source code. It is an occasion to return to the old concepts of IO modules to isolate the accesses to the database and facilitate its future migration to new database and cloud implementations. Related to storage, the approach is to offer a Storage Transfer service to move data from legacy storage or another Cloud Provider Storage. Google Database & Storage Migration guides and utilities Google Cloud offers a migration assessment guide, migration tools (mainly data transfer tools), and collaboration with our partners to help manage the full life cycle of database migration https://cloud.google.com/db-migration/. Assessment Guides: MySQL to Google Cloud Platform & Cloud Spanner PostgreSQL to Google Cloud Platform & Cloud Spanner Oracle to Cloud Spanner DynamoDB to Cloud Spanner HBase to Cloud Bigtable And more https://cloud.google.com/solutions/database-migration/technical-resources Relate to Storage, Google offers a Storage Transfer Service that allows you to quickly import online data into Cloud Storage. You can also set up a repeating schedule for transferring data, as well as transfer data within Cloud Storage, from one bucket to another. Finally for hybrid approach, Egnyte, a Google Cloud Technology Partner, offers Google customers the ability to create a hybrid SaaS file sync and share infrastructure that harnesses the power and flexibility of Google Cloud services with the security and centralized IT administration of on-premises content management. https://cloud.google.com/solutions/partners/egnyte-enabling-hybrid-storage In addition there is a guide about how to migrate from Amazon Simple Storage Service (Amazon S3) to Cloud Storage. Amazon Database & Storage Migration guides and utilities Amazon offers a more sophisticate approach to migrate legacy Databases to cloud (https://aws.amazon.com/dms/). In addition to guides and partners they offer tools like: AWS Schema Conversion Tool That makes heterogeneous database migrations predictable by automatically converting the source database schema and a majority of the database code objects, including views, stored procedures, and functions, to a format compatible with the target database. AWS Database Migration Service (AWS DMS) A cloud service that makes it easy to migrate relational databases, data warehouses, NoSQL databases, and other types of data stores. You can use AWS DMS to migrate your data into the AWS Cloud, between on-premises instances (through an AWS Cloud setup), or between combinations of cloud and on-premises setup. Related to Storage, Amazon offers a suite of tools to help you move data via networks, roads and technology partners. Amazon offers two kind of migration tools: Unmanaged Cloud Data Migration Tools rsync. S3 command line interface. Glacier command line interface. Managed Cloud Data Migration Tools AWS Direct Connect AWS Snowball Edge AWS Snowmobile Amazon Transfer Family Finally AWS offers an Hybrid approach for storage called AWS Storage Gateway that connects an on-premises software appliance with cloud-based storage to provide seamless integration with data security features between your on-premises IT environment and the AWS storage infrastructure. AWS Storage Gateway offers file-based, volume-based, and tape-based storage solutions. Azure Database & Storage Migration guides and utilities Azure offers a Database Migration Service (https://azure.microsoft.com/en-us/services/database-migration/) that enables seamless migrations from multiple database sources to Azure Data platforms with minimal downtime. The service uses the Data Migration Assistant to generate assessment reports that provide recommendations to guide you through the changes required prior to performing a migration. When you’re ready to begin the migration process, the Azure Database Migration Service performs all of the required steps. The options covered by the service are: Migrate SQL Server to Azure SQL Database Migrate MySQL to Azure Database for MySQL Migrate PostgreSQL to Azure Database for PostgreSQL Migrate MongoDB to Azure Cosmos DB Mongo Related to Storage, Azure like AWS offers a suite of tools to help you move data via networks, roads and technology partners. Unmanaged Cloud Data Migration Tools AzCopy Azure PowerShell Azure CLI Azure Storage REST APIs/SDKs. Managed Cloud Data Migration Tools Azure Data Box family for online transfers Azure Data Factory Azure Import/Export In addition, Azure has incremented his portfolio of Hybrid Storage approach with: Azure File Sync, HPC Cache & Data Box (Hybrid Storage) You can use Azure File Sync to centralize your organization’s file shares in Azure Files, while keeping the flexibility, performance, and compatibility of an on-premises file serve. Azure Data Box Gateway is a storage solution that enables you to seamlessly send data to Azure. Azure HPC Cache speeds access to your data for high-performance computing (HPC) tasks. This service can be used even for workflows where your data is stored across WAN links, such as in your local datacenter network-attached storage (NAS) environment. Avere vFXT for Azure & Azure FXT Edge Filer (hybrid storage cache) Avere vFXT for Azure is a filesystem caching solution for data-intensive high-performance computing (HPC) tasks. It lets you take advantage of cloud computing’s scalability to make your data accessible when and where it’s needed – even for data that’s stored in your own on-premises hardware. Azure FXT Edge Filer is a hybrid storage caching appliance that provides fast file access and active archive for high-performance computing (HPC) tasks. Alibaba Database & Storage Migration guides and utilities Alibaba offers specific services to migrate Databases to his Cloud and also offers a Alibaba Cloud Data Transmission Service (DTS) to achieve object definition and data migration. Object definition migration means migrating the definition syntax of structure objects, such as a table or a view, to the target database. In addition to the object definition, DTS migrates data stored in the table to the target database. The options covered by DTS are: On-premises databases to RDS instances or ECS instance databases ECS instance databases to RDS instances Redis instances to Redis instances in classic networks or ECS instance databases Related to Storage, Alibaba (apart from DTS) offers OssImport to migrate data from third-party storage products (or from another OSS source like Amazon S3) to OSS. Finally, Alibaba like his peers has started to develop hybrid approaches like: Cloud Storage Gateway A gateway service that you can deploy on the premises or in the cloud. This service provides a seamless and secure connection between your on-premises IT infrastructure and cloud-based storage services at the back end. Hybrid Cloud Storage Array (coming soon) A cost-effective, efficient and easy-to-manage hybrid cloud storage solution. Hybrid Backup Recovery An enterprise-level data backup and migration service that is secure, efficient, cost-effective, and scalable. Hybrid Backup Recovery provides full protection for data stored in Alibaba Cloud and on-premises data centers. Bonus – MongoDB Atlas Mongo DB Atlas is the most relevant multi-cloud managed database. Choose from over 60 cloud regions across Amazon Web Services (AWS), Microsoft Azure, and Google Cloud Platform (GCP). You can also move any deployment to MongoDB Atlas with minimal impact to your application using the live migration feature in the application. Live migration works by keeping a cluster in MongoDB Atlas in sync with your source database until you’re ready to cut over. MongoDB Atlas takes care of: Automates infrastructure provisioning, setup, and deployment Assure high availability Handle backups Secure my data Monitoring and performance optimization Support for the cloud database platform and product usage inquiries SLA You can review a comparison of Mongo DB Atlas with the managed services of Amazon DocumentDB and Azure Cosmos DB API; https://www.mongodb.com/cloud/atlas/compare Conclusion Clearly, cloud providers are aligning themselves in offering common database and storage services; Relational Databases All providers offers a custom (or native cloud) solution that have the advantage of better scalability, HA options and price but generate a lock-in They also offer a managed service of market databases like MySQL, PostgreSQL, SQL Server and MariaDB to simplify the migration of legacy applications and avoid lock-in The size limit is 64TB (100 TB for Azure) Non Relational Databases (Key Value, Columnar, Document and Graph) In this case the bet of all providers is to develop only custom solutions per each kind of Non Relational Databases There is some consensus to offer a Document Database with MongoDB compatibility (with the exception of Google). The Graph Database is emerging in 2019, and Google and Alibaba that don’t have a custom development are offering JanusGraph deployment waiting for a custom solution to be developed. Amazon and Alibaba have decided to create a specialized Timestream database for IoT events and operational applications. Storage Services All the providers offers a similar portfolio of Block, Object and File storage services even with similar SLA (except Alibaba that is lower) Hybrid approaches was the trending topic in the last years. In memory Cache All providers bet to implement a Redis solution, but they also started to include Memcache in the in memory solution. However, there are some relevant differences: Relational Databases Google has developed additional technics like Interleaved tables to improve the performance of his custom solution (Cloud Spanner) Amazon offers more options of Market Managed Database and better tools for migrating databases from on-premises Azure offers his well know SQL Server as the custom solution with the higher size; 100 TB and good latency. The Database sizes in Alibaba are very low (2-16 TB) and the SLA is also below than his competitors (99.9%) Non Relational Databases (Key Value, Columnar, Document and Graph) Azure is the only cloud provider that has opted to implement a multimodal database with SQL access (Azure Cosmos DB) with homogeneous capabilities and SLA Amazon is specializing its non-relational databases to cover very specific tasks that are currently demanded by the market such as; Timestream databases and cryptographically verifiable ledger (Blockchain). In addition Amazon are adding Cache specific solution to his more used Databased like DynamoDB Google offers a Pricing model focus on storage and access omitting instances and technical resources required. Storage Services During 2019-2020 AWS and Azure are making efforts to improve critical hybrid solutions for their customers. However, GCP, which is less sensitive to business needs, has hardly improved in this regard. In memory Cache Amazon offers the best implementation of Redis with a capacity of up to 250 TB and AZ replication Finally an option to consider to avoid the lock-in of the implementations of the cloud providers is to bet by multi-cloud market managed database like MongoDB Atlas where you can port your Database to GCP, AWS or Azure platform with very lower impact to the applications.... Read more...Public Cloud Compute Services Review (May 2020 update)The objective of this post is to analyze the Compute Services offered by the four main providers of public cloud; AWS, GCP, Azure and Alibaba Table of Contents Public Cloud Compute ServicesIaaS (Infrastructure as a Service)CaaS (Container as a Service)AaaS (Application as a Service)FaaS (Function as a Service)Public Cloud Compute Services Use Cases & RecommendationsIaaSAaaSCaaSFaaSAWS Compute ServicesAmazon Elastic Compute Cloud (Amazon EC2)TechnologySLAMachine TypesMachine OptionsDisks (Block & File Devices)Instance store volumesElastic Block Storage (EBS)Cloud File StorageAuto ScalingParallel ClusterBilling ModelOtherAmazon Lightsail EC2 Container Service (ECS) & Elastic Container Service for Kubernetes (EKS)AWS Elastic Beanstalk and AWS BatchAWS LambdaLanguage RuntimesEvents and TriggersAWS Serverless Application Model (AWS SAM) and Serverless Application RepositoryTimeoutSLABilling ModelGCP Compute ServicesGoogle Compute EngineTechnologySLAMachine TypesMachine OptionsDisks (Block & File Devices)Persistence DiskLocal DiskFile ServerRam DiskAuto ScalingBilling ModelOtherGoogle Kubernetes Engine & Registry (and Build)Google App EngineGoogle FunctionsLanguage RuntimesEvents and TriggersTimeoutSLABilling ModelAzure Cloud Compute ServicesAzure VMTechnologySLAMachine TypesMachine OptionsDisks (Block & File Devices)Azure VMs use three types of Disks Storage:Azure File Storage:Auto ScalingAzure CycleCloudBilling ModelOtherAzure Kubernetes Service (AKS)Azure Apps Service, Azure Cloud Services and Azure BatchAzure Spring Cloud (preview)Azure FunctionsLanguage RuntimesTimeoutSLABilling ModelAlibaba Cloud Compute ServicesAlibaba Elastic Compute Service (ECS)TechnologySLAMachine TypesMachine OptionsDisks (Block & File Devices)Alibaba Disk StorageAlibaba NASAuto ScalingBilling ModelOtherContainer Service, Container Service for Kubernetes & Elastic Container Instance (ECI)Simple Application Server & Batch ComputeFunctions ComputeLanguage RuntimesEvents and TriggersTimeoutSLABilling ModelPublic Cloud Compute Services: IaaS ComparisonPublic Cloud Compute Services: CaaS ComparisonPublic Cloud Compute Services: AaaS ComparisonPublic Cloud Compute Services: FaaS ComparisonConclusionFrom the point of view of the layers offeredIaasCaaSAaaSFaaSFrom the point of view of the developer who has to create an application in the cloud.From the the point of view of evolution during 2019-2020 Public Cloud Compute Services We can define Public Cloud Compute Services as the Cloud Platform or Engine to execute your business logic. Ok this is a very generic definition, but we can understand the services better if we go deeper into the next level. In general, AWS, GCP, Azure and Alibaba have structured his Compute Services in four types of services: Infrastructures as a Service (IaaS) Container as a Service (CaaS) Application as a Service (AaaS) Functions as a Service (FaaS) Moving from a model of high level of configurability and access of the underline infrastructure (IaaS) to a Serveless Model were the developer only have to take care of the application code (FaaS). IaaS (Infrastructure as a Service) IaaS was the first computing services offer by the Public cloud Provider and now it is a commodity.IaaS provides the basic building blocks for cloud IT and typically provide access to networking features, computers (virtual or on dedicated hardware), and data storage space. Infrastructure as a Service provides you with the highest level of flexibility and management control over your IT resources and is most similar to existing IT resources that many IT departments are familiar with today. During the last years all the Public Cloud has aligned the offering covering the following features: Predefined Virtual Machines with a wide range of VCPU’s and Memory depending of your type of workload: Standard o General Purpose High CPU Optimize High Memory Optimize Custom Virtual Machines where you can combine Cores and Memory to cover your specific needs Graphics processing units (GPUs) to accelerate specific workloads on your instances such as machine learning and data processing. Linux & Windows Support SSD and Magnetic storage local o network disks Supports Auto Scaling Supports different model of Billing; On demand, Reserved or Preemptible Images and Instance templates management Custom or default Virtual private network to deploy the VM And offer different models of Machine agreements: Dedicated Instances are instances that run in a VPC on hardware that’s dedicated to a single customer. Your Dedicated instances are physically isolated at the host hardware level from instances that belong to other accounts. Dedicated instances may share hardware with other instances from the same account that are not Dedicated instances. Pay for Dedicated Instances On-Demand, Reserved Instances, or Spot Instances. On-Demand Instances let you pay for compute capacity by unit of time with no long-term commitments or upfront payments. Perfect for users that want the low cost and flexibility without any up-front payment or long-term commitment Applications with short term, spiky, or unpredictable workloads that cannot be interrupted Applications being developed or tested for the first time Reserved Instances provides you with a capacity reservation, and offer a significant discount on the hourly charge for an instance 1 Year to 3 Year Terms. Applications with steady state or predictable usage Applications that require reserved capacity Spot Instances: With Spot Instances, you can bid for unused capacity in a cloud vendors data center. You can save up to 90% of the cost when compared to On-Demand Instances. However, if some else bids higher than you, your Instance will be taken away. Applications that have flexible start and end times Applications that are only feasible at very low compute prices Users with an urgent need for large amounts of additional computing capacity Dedicate Host are physical server dedicated for your use. Dedicated Hosts can help you reduce costs by allowing you to use your existing server-bound software licenses. Useful for regulatory requirements that may not support multi-tenant virtualization. Great for licensing which does not support multi-tenancy or cloud deployments. Can be purchased On-Demand or Reserved On Premises that allows extend the Provider fully managed IaaS solution on-premises under an hybrid approach. CaaS (Container as a Service) CaaS provides a managed environment for deploying, managing, and scaling your containerized applications. The trend today is to use Docker containers with Kubernetes that was led by Google. Kubernetes is open source software that allows you to deploy and manage containerized applications at scale. Kubernetes manages clusters of the Public Cloud IaaS compute instances and runs containers on those instances with processes for deployment, maintenance, and scaling. Using Kubernetes, you can run any type of containerized applications using the same toolset on-premises and in the cloud. In the future the vendors will also provide solutions to cover all life cycle management including Continuous Integration and Continuous Delivery customize for a Kubernetes/Docker environment. Up to date they are offering approaches based on Open Source solutions. Finally the trend is start to offer CaaS with Kubernetes in a Serverless mode. AaaS (Application as a Service) AaaS is the next level of abstraction provide by the Public Cloud providers to simplify the work of deploying web and mobile applications offering a fully managed platform that completely abstracts away infrastructure so you focus only on code. In addition to the Infrastructure abstraction the AaaS also cover the life cycle management of the application to perform more robust deployment workflows than deploying your website directly to production. Finally, under AaaS is also cover the Batch Engine that allows you to run applications, long-running scripts, or heavy compute scripts without creating or managing the underlying infrastructure of VM pool. AaaS seems the optimal solution for new applications, however there are some issues: Each Public Cloud offer a different approach (and no one standard) that means a strong Lock-in with the Vendor It seems that Public Cloud vendors have stopped betting on this initiative and focus on the options of CaaS and FaaS There are limitations in the use of third-party products, languages and application architecture. FaaS (Function as a Service) FaaS is the maximum level of abstraction provide by the Public Cloud vendors to simplify the deployment of code. FaaS is a Serverless execution environment for building cloud services encapsulated in functions. With FaaS you write simple, single-purpose functions that can be used in the following way: As an event-driven compute service where the function runs in response to events. As a compute service to run your function in response to HTTP requests. Functions are really Serverless and Scales automatically. Although the technology used for FaaS in each Public Cloud vendor is different, the interfaces and features are very similar, which allows with a light architecture to avoid the Lock-in. The billing model is also very similar; Pay only while your code runs. However, FaaS is not the perfect solution to develop applications. It has limitations that must be taken into account as: Limited execution Timeout Latency to start the function Limited languages So up to date, Functions are appropriate anytime you want to use Serverless infrastructure to run code snippets that no need a low latency response. In addition, some providers are empowering the Serverless model with the concept of Serverless Applications as a combination of Functions and the rest of resources requires to run an application like interfaces API’s, events, etc. Public Cloud Compute Services Use Cases & Recommendations IaaS AaaS CaaS FaaS AWS Compute Services Amazon Elastic Compute Cloud (Amazon EC2) Technology The technology behind AWS EC2 VMs is Xen SLA Monthly Uptime Percentage to Customer of at least 99.99% Machine Types Selection of instance types optimized to fit different use cases. Up to 96 VCPU & 768 GB Memory. GPU Up to 16 GPU & 64 GB of GPU Memory Machine Options Dedicated Instances are Amazon EC2 instances that run in a VPC on hardware that’s dedicated to a single customer. Your Dedicated instances are physically isolated at the host hardware level from instances that belong to other AWS accounts. Dedicated instances may share hardware with other instances from the same AWS account that are not Dedicated instances. Pay for Dedicated Instances On-Demand, save up to 70% by purchasing Reserved Instances, or save up to 90% by purchasing Spot Instances. On Demand Instances you pay for compute capacity by per hour or per second depending on which instances you run. No longer-term commitments or upfront payments are needed. Reserved Instances provide you with a significant discount (up to 75%) compared to On-Demand instance pricing. In addition, when Reserved Instances are assigned to a specific Availability Zone, they provide a capacity reservation, giving you additional confidence in your ability to launch instances when you need them. Spot Instances– Amazon EC2 Spot instances allow you to request spare Amazon EC2 computing capacity for up to 90% off the On-Demand price Dedicated Hosts – Physical EC2 server dedicated for your use. Dedicated Hosts can help you reduce costs by allowing you to use your existing server-bound software licenses. Can be purchased On-Demand (hourly) Can be purchased as a Reservation for up to 70% off the On-Demand price. On Premises – AWS Outposts that allows run AWS infrastructure and services on premises for a truly consistent hybrid experience Disks (Block & File Devices) Amazon EC2 supports two types of block devices: Instance store volumes (virtual devices whose underlying hardware is physically attached to the host computer for the instance) and EBS volumes (remote storage devices), and a File Devices under Cloud File Storage Instance store volumes An instance store provides temporary block-level storage for your instance. This storage is located on disks that are physically attached to the host computer. Instance store is ideal for temporary storage of information that changes frequently, such as buffers, caches, scratch data, and other temporary content, or for data that is replicated across a fleet of instances, such as a load-balanced pool of web servers. SSD (up to 60 TB) and Magnetic (up to 48TB) Elastic Block Storage (EBS) Amazon EBS allows you to create storage volumes and attach them to Amazon EC2 instances. Once attached, you can create a file system on top of these volumes, run a database, or use them in any other way you would use a block device. Amazon EBS volumes are placed in a specific Availability Zone, where they are automatically replicated within the same AZ to protect you from the failure of a single component. You can create EBS General Purpose SSD (gp2), Provisioned IOPS SSD (io1), Throughput Optimized HDD (st1), and Cold HDD (sc1) volumes up to 16 TiB in size. Cloud File Storage Cloud file storage is a method for storing data in the cloud that provides servers and applications access to data through shared file systems. This compatibility makes cloud file storage ideal for workloads that rely on shared file systems and provides simple integration without code changes. Amazon Cloud File Storage systems can store petabytes of data. Auto Scaling AWS Auto Scaling monitors your applications and automatically adjusts capacity to maintain steady, predictable performance at the lowest possible cost. Features: Auto Scaling Plans Maintain current instance levels based on a periodic health check Manual Scaling where you specify the change in the maximum, minimum or desired capacity Scheduling Scaling for predictable changes Dynamic Scaling based on a policy Auto Scaling Group is a collection of AWS EC2 instances by the Autoscaling Service that have a minimum, maximum, and, desired number of EC2 instances. Scaling Policy can be associated with CloudWatch alarms The cooldown period is a configurable setting for your Auto Scaling group that helps to ensure that it doesn’t launch or terminate additional instances before the previous scaling activity takes effect. Parallel Cluster AWS ParallelCluster is an AWS-supported open source cluster management tool (based on CfnCluster project) that helps you to deploy and manage High Performance Computing (HPC) clusters in the AWS Cloud Billing Model On Demand: Pricing is per instance-hour consumed for each instance, from the time an instance is launched until it is terminated or stopped. Each partial instance-hour consumed will be billed per-second for Linux Instances and as a full hour for all other instance types. Discounts: Spot Instances up to 90% Reserved Instances up to 75% Dedicated Host depending on your legacy SW licenses reutilizations Other Linux & Windows Support Public and Custom Image Support Snapshot support Start & Termination Script Migration tools and methodology VMware Cloud on AWS is an integrated cloud offering jointly developed by AWS and VMware that allows organizations to seamlessly migrate and extend their on-premises VMware vSphere-based environments to the AWS Cloud running on Amazon EC2 bare metal infrastructure. Amazon Lightsail Lightsail is a lightweight, simplified product offering of AWS, hard disks are fixed size EBS SSD volumes, instances are still billable when stopped, security group rules are much less flexible, and only a very limited subset of EC2 features and options are accessible. Lightsail has been created for customers who want a very simple to understand hosting plan and host simple websites. EC2 Container Service (ECS) & Elastic Container Service for Kubernetes (EKS) AWS offer two options for CaaS: EC2 Container Service (ECS). This was the first version of CaaS. It is a highly scalable, fast, container management AWS service that makes it easy to run, stop, and manage Docker containers on a cluster. Elastic Container Service for Kubernetes (EKS). Amazon EKS runs the Kubernetes management infrastructure. Applications running on any standard Kubernetes environment are fully compatible and can be easily migrated to Amazon EKS The original AWS solution for CaaS was ECS, however due to the market pressure with Kubernetes AWS decided to release the managed service of Kubernetes EKS. Currently the integration of EKS with the rest of AWS services is not as complete as ECS but it is a matter of time. Clearly the winning bet is EKS given the compatibility with other managed services of kubernetes and implementations on premise. Amazon EKS features: AWS Load-balancing integration. Automatic scaling of your cluster’s node instance count Automatic upgrades for your cluster’s node software Hybrid Networking Workload Portability, on-premises and cloud Identity and Access Management Integration Logging and Monitoring Amazon ECR Registries allows o host your images in a highly available and scalable architecture, allowing you to deploy containers reliably for your applications. You can use your registry to manage image repositories and Docker images. Each AWS account is provided with a single (default) Amazon ECR registry with the additional features: Fine-grained access control. Existing CI/CD integrations You pay per hour for each Amazon EKS cluster that you create and for the AWS resources you create to run your Kubernetes worker nodes. AWS Elastic Beanstalk and AWS Batch AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS. Features: Wide Selection of Application Platforms; Java, .NET, Node.js, PHP, Ruby, Python, Go, and Docker to deploy your web applications. Variety of Application Deployment Option (Visual Studio and Eclipse) Monitoring, Logging, and Tracing Management and Updates Scaling AWS Resources Customization AWS Batch enables developers, scientists, and engineers to easily and efficiently run hundreds of thousands of batch computing jobs on AWS. AWS Batch dynamically provisions the optimal quantity and type of compute resources. Features: Support for multi-node parallel jobs Granular job definitions Simple job dependency modeling Support for popular workflow engines Dynamic compute resource provisioning and scaling Priority-based job scheduling Dynamic spot bidding Integrated monitoring and logging Fine-grained access control There is no additional charge for AWS Elastic Beanstalk and AWS Batch. You pay for AWS resources (e.g. EC2 instances or S3 buckets) you create to store and run your application. AWS Lambda Language Runtimes AWS Lambda natively supports Java 8-11, Go 1.x, PowerShell and C# (.Net Core 3.1 and 2.1), Python 3.8-3.7-3.6-2.7, Node.js 12 & 10, and Ruby 2.7-2.5 code. In addition, you can implement an AWS Lambda runtime in any programming language. A runtime is a program that runs a Lambda function’s handler method when the function is invoked. Events and Triggers HTTP— HTTP requests using Amazon API Gateway or API calls made using AWS SDKs. Amazon S3 Amazon DynamoDB Amazon Kinesis Data Streams Amazon Simple Notification Service Amazon Simple Email Service Amazon Simple Queue Service Amazon Cognito AWS CloudFormation Amazon CloudWatch Logs Amazon CloudWatch Events AWS CodeCommit Scheduled Events (powered by Amazon CloudWatch Events) AWS Config Amazon Alexa Amazon Lex Amazon API Gateway AWS IoT Button Amazon CloudFront Amazon Kinesis Data Firehose Other Event Sources: Invoking a Lambda Function On Demand AWS Serverless Application Model (AWS SAM) and Serverless Application Repository An open-source framework that you can use to build Serverless Application ( a combination of Lambda functions, event sources, and other resources that work together to perform tasks) together with a repository for serverless applications. Timeout Function execution time is limited by the timeout duration, which you can specify at function deployment time. A function times out after 3 seconds by default, but you can extend this period up to 15 minutes. When function execution exceeds the timeout, an error status is immediately returned. SLA Monthly Uptime Percentage <= 99.95% Billing Model Lambda counts a request each time it starts executing in response to an event notification or invoke call, including test invokes from the console. You are charged for the total number of requests across all your functions. Duration is calculated from the time your code begins executing until it returns or otherwise terminates, rounded up to the nearest 100ms. The price depends on the amount of memory you allocate to your function. Data Transfer out to internet The Lambda free tier includes 1M free requests per month and 400,000 GB-seconds of compute time per month. GCP Compute Services Google Compute Engine Technology The technology behind Google Cloud’s VMs is KVM SLA Monthly Uptime Percentage to Customer of at least 99.99% Machine Types Predefined machine types Predefined machine types have a fixed collection of resources (Up to 224 VCPU & 896 GB Memory). Custom machine Up to 416 VCPU & The memory per vCPU of a custom machine type must be between 0.5 GB and 8 GB per vCPU. If you require more memory, you must use one of the mega-memory machine types, which allow you to create instances with a total of 1.4 TB per VM instance. GPU Up to 8 GPU & 96 GB of GPU Memory Machine Options Dedicated Instances On Demand Instances allows you to pay a fixed rate by second with no Commitment. Reserved Instances (Committed-use discounts ) If your workload is stable and predictable, you can purchase a specific amount of vCPUs and memory for a discount off of normal prices in return for committing to a usage term of 1 year or 3 years. The discount is up to 57% for most machine types or custom machine types. The discount is up to 70% for memory-optimized machine types. Spot Instances (Preemptible VM) An instance that you can create and run at a much lower price than normal instances. However, Compute Engine might terminate (preempt) these instances if it requires access to those resources for other tasks. Get upto 79% discount Can not live Migrate and auto Restart 24 hours max use and Not covered under SLA Charged if only started for 10 min , Less use will not be billed. When you attach GPU to preemptible – you quota will be used. Compute Engine sends signal for preemption to VM 30 sec Average preemption rate varies between 5% and 15% per seven days per project Shielded VM Shielded VM offers verifiable integrity of your Compute Engine VM instances, so you can be confident your instances haven’t been compromised by boot- or kernel-level malware or rootkits. Shielded VM’s verifiable integrity is achieved through the use of Secure Boot, virtual trusted platform module (vTPM)-enabled Measured Boot, and integrity monitoring. Dedicate Host (Sole-tenant nodes) Sole-tenant nodes are physical Compute Engine servers dedicated to hosting only VM instances from your specific project. On Premises – Anthos GKE on-prem (GKE on-prem) is hybrid cloud software that brings GKE to on-premises data centers. Disks (Block & File Devices) By default, each Compute Engine instance has a single boot persistent disk that contains the operating system. When your applications require additional storage space, you can add one or more additional storage options to your instance. Persistence Disk Network Storage & Attached VM through network Interface Persistent and independent of compute(instance) Zonal (or regional with synchronous replication across two zones in a region) Used as Bootable, Snapshots Type: Standard (magnetic) up to 64TB SSD up to 64TB Resize dynamically (even when instance is running) Attached to multiple VM for read only data Automatic Encryption – You can choose your own key Lower performance with corresponding Local SSD/ Ram disk Local Disk Local Disk can be attached to VM Ephemeral in nature; Data stays on Restart but not on Instance stopped / terminate Provided high IOPS based on size of disk; Upto 680K read and 360 write You can attach a maximum of 24 local SSD partitions for a total of 9 TB per instance. Can not live migrate SCSI or NVMe Interface Not available for Shared Core File Server A file server, also called a storage filer, provides a way for applications to read and update files that are shared across machines. It manages multiples filer solution like Elastifile, Quobyte, Avere, Panzura and others. Support petabytes for Elastifile and Quobyte Ram Disk RAM disks share instance memory with your applications and use the RAM assigned to the VM instance. Faster than any disk option available Ephemeral – goes away on stop, restart, terminate Auto Scaling Autoscaling is a feature of managed instance groups. A managed instance group is a pool of homogeneous instances, created from a common instance template. An autoscaler adds or deletes instances from a managed instance group. Although Compute Engine has both managed and unmanaged instance groups, only managed instance groups can be used with autoscaler. Compute Engine offers autoscaling to automatically add or remove virtual machines from an instance group based on increases or decreases in load. This allows your applications to gracefully handle increases in traffic and reduces cost when the need for resources is lower. You just define the autoscaling policy and the autoscaler performs automatic scaling based on the measured load. Autoscaling policy and target utilization To create an autoscaler, you must specify the autoscaling policy and a target utilization level that the autoscaler uses to determine when to scale the group. You can choose to scale using the following policies: Average CPU utilization HTTP load balancing serving capacity, which can be based on either utilization or requests per second. Stackdriver Monitoring metrics Billing Model All vCPUs, GPUs, and GB of memory are charged a minimum of 1 minute. After 1 minute, instances are charged in 1 second increments. Discounts: Sustained use discounts (When an instance uses a vCPU or a GB of memory for more than 25% of a month) up to 30% Committed use discounts up to 70% Pre-emptible up to 80% Other Linux & Windows Support Public and Custom Image Support Managed and unmanaged Instance Groups Support Snapshot support Start & Termination Script Migration tools and methodology Google Kubernetes Engine & Registry (and Build) Google Kubernetes Engine provides a managed environment for deploying, managing, and scaling your containerized applications using Google infrastructure. The environment GKE provides consists of multiple machines (specifically, Google Compute Engine instances) grouped together to form a cluster. GKE clusters are powered by the Kubernetes open source cluster management system and Docker Image Support with the following features: Google Cloud Platform’s load-balancing for Compute Engine instances Node pools to designate subsets of nodes within a cluster for additional flexibility Multi-zone Clusters or Regional Clusters Automatic scaling of your cluster’s node instance count Automatic upgrades for your cluster’s node software Node auto-repair to maintain node health and availability Hybrid Networking Workload Portability, on-premises and cloud Dashboard for your project’s GKE clusters and their resources. You can use these dashboards to view, inspect, manage, and delete resources in your clusters Identity and Access Management Integration Logging and Monitoring with Stackdriver for visibility into your cluster Google Container Registry is a private container image registry that runs on Google Cloud Platform. Container Registry supports Docker Image Manifest V2 and OCI image formats with the additional features: Perform vulnerability analysis Fine-grained access control. Existing CI/CD integrations Google Cloud Build is a service that executes your builds on Google Cloud Platform’s infrastructure. Cloud Build can import source code from a variety of repositories or cloud storage spaces, execute a build to your specifications, and produce artifacts such as Docker containers or Java archives. GKE uses Google Compute Engine instances for nodes in the cluster. You are billed for each of those instances according to Compute Engine’s pricing, until the nodes are deleted. Support GKE on premises with Anthos GKE on-prem that brings Google Kubernetes Engine (GKE) to on-premises data centers. With GKE on-prem, you can create, manage, and upgrade Kubernetes clusters in your on-premises environment. Google is also pushing the concept of Serverless in containers with Knative. Knative provides an open API and runtime environment that enables you to run your serverless workloads anywhere you choose: fully managed on Google Cloud, or on Anthos on Google Kubernetes Engine (GKE), or on your own Kubernetes cluster. Batch on GKE (Batch) the GCP batch solution for scheduling and managing batch workloads. With Batch, you can leverage the on-demand and flexible nature of cloud. Batch is based on Kubernetes and containers so your jobs are portable. Google App Engine Google App Engine is a fully managed platform that completely abstracts away infrastructure so you focus only on code. Google offers two environments: App Engine Flexible Environment App Engine allows developers to focus on doing what they do best, writing code. Based on Google Compute Engine, the App Engine flexible environment automatically scales your app up and down while balancing the load. Microservices, authorization, SQL and NoSQL databases, traffic splitting, logging, versioning, security scanning, and content delivery networks are all supported natively. In addition, the App Engine flexible environment allows you to customize the runtime and even the operating system of your virtual machine using Dockerfiles. App Engine Standard Environment The App Engine standard environment is based on container instances running on Google’s infrastructure. Containers are preconfigured with one of several available runtimes. Applications run in a secure, sandboxed environment, allowing the App Engine standard environment to distribute requests across multiple servers, and scaling servers to meet traffic demands. Your application runs within its own secure, reliable environment that is independent of the hardware, operating system, or physical location of the server. Applications running in the App Engine flexible environment are deployed to virtual machine types that you specify. You are billed for each of those instances according to Compute Engine’s pricing. Applications running in the App Engine standard environment are deployed to instance classes that you specify, that have a cost per hour per instance. General Features: Fully managed serverless application platform Wide Selection of Application Platforms; Java, PHP, Node.js, Python, C#, .Net, Ruby and Go and Docker to deploy your web applications. Variety of Application Deployment Option (Cloud Source Repositories, IntelliJ IDEA, Visual Studio) Monitoring, Logging, and Diagnostics Application Versioning Scaling GCP Resources Customization Traffic Splitting Application Security Google App Engine allows Scheduling Tasks With Cron for Python. The App Engine Cron Service allows you to configure regularly scheduled tasks that operate at defined times or regular intervals. This is a basic Batch service. For more complex Batch you can use Google Data Flow. Cloud Dataflow is a fully-managed service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness. Google Data Flow features: Based on Apache Beam (java & Python) Automated Resource Management Dynamic Work Rebalancing Horizontal Auto-scaling Applications running in the App Engine flexible environment are deployed to virtual machine types that you specify. You are billed for each of those instances according to Compute Engine’s pricing. Applications running in the App Engine standard environment are deployed to instance classes that you specify, that have a cost per hour per instance. Cloud Dataflow service usage is billed in per second increments, on a per job basis. Google Functions Language Runtimes Cloud Functions can be written using JavaScript (Node.js 6-8-10), Python (Python 3.7.6), or Go (Go 1.11 and 1.13(beta)) runtimes on Google Cloud Platform with Ubuntu and Debian (for node.js 6). Events and Triggers HTTP—invoke functions directly via HTTP requests. Cloud Storage Cloud Pub/Sub Cloud Firestore Firebase (Realtime Database, Storage, Analytics, Auth) Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function. Timeout Function execution time is limited by the timeout duration, which you can specify at function deployment time. A function times out after 1 minute by default, but you can extend this period up to 9 minutes. When function execution exceeds the timeout, an error status is immediately returned. SLA Monthly Uptime Percentage <= 99.5% Billing Model Invocations are charged at a per-unit rate, excluding the first 2 million free invocations per month and are charged regardless of the outcome of the function or its duration. Compute time. Compute time is measured in 100ms increments, rounded up to the nearest increment. Memory provisioned. Outbound data transfer (that is, data transferred from your function out to somewhere else) is measured in GB and charged at a flat rate. Outbound data to other Google APIs in the same region is free, as is inbound data Azure Cloud Compute Services Azure VM Technology Azure runs on a customized version of Hyper-V SLA Monthly Uptime Percentage to Customer of at least 99.95% Machine Types Selection of instance types optimized to fit different use cases Up to 416 VCPU & 11.4 TB Memory GPU Up to 8 GPU & 96 GB of GPU Memory Machine Options Dedicated Instances On-Demand Instances (Pay as you go) –Pay for compute capacity by the second, with no long-term commitment or upfront payments. Increase or decrease compute capacity on demand. Start or stop at any time and only pay for what you use. – allows you to pay a fixed rate by the hour (or by the second) with no Commitment. Reserved Virtual Machine Instances – An Azure Reserved Virtual Machine Instance is an advanced purchase of a Virtual Machine for one or three years in a specified region. The commitment is made up front, and in return, you get up to 72 percent price savings compared to pay-as-you-go pricing Spot (low-Priority VM) enable you to take advantage of our unused capacity. The amount of available unused capacity can vary based on size, region, time of day, and more. When deploying Low-priority VMs in VM scale sets, Azure will allocate the VMs if there is capacity available, but there are no SLA guarantees. At any point in time when Azure needs the capacity back, we will evict low-priority VMs. Low-priority Linux VMs come with 80% discount while the Windows VMs come with 60% discount. Dedicated Hosts (Isolated VM) – Azure Compute offers virtual machine sizes that are Isolated to a specific hardware type and dedicated to a single customer. These virtual machine sizes are best suited for workloads that require a high degree of isolation from other customers for workloads involving elements like compliance and regulatory requirements. Customers can also choose to further subdivide the resources of these Isolated virtual machines by using Azure support for nested virtual machines. On Premises – Azure Stack Portfolio is an extension of Azure to consistently build and run hybrid applications across datacenters, edge locations, remote offices, and cloud. Azure Stack provides customers choice and flexibility based on their solution needs from consistent hybrid cloud on-premises with Azure Stack Hub that can be connected or disconnected from public cloud, to high-performance virtualization on-premises with Azure Stack HCI or an Azure managed appliance that provides intelligent compute and AI at the edge with Azure Stack Edge. Disks (Block & File Devices) Azure VMs use three types of Disks Storage: Operating System Disk (OS Disk) The C drive in Windows or /dev/sda on Linux. This disk is registered as an SATA drive and has a maximum capacity of 2048 gigabytes (GB). This disk is persistent and is stored in Azure storage. Temporary Disk The D drive in Windows or /dev/sdb on Linux. This disk is used for short term storage for applications or the system. Data on this drive can be lost in during a maintenance event, or if the VM is moved to a different host because the data is stored on the local disk. Data Disk Registered as a SCSI drive. These disks can be attached to a virtual machine, the number of which depends on the VM instance size. Data disks have a maximum capacity of 32 TB per disk. These disks are persistent and stored in Azure Storage. There are two types of disks in Azure: Managed or Unmanaged. Unmanaged disks With unmanaged disks you are responsible for ensuring for the correct distribution of your VM disks in storage accounts for capacity planning as well as availability. An unmanaged disk is also not a separate manageable entity. This means that you cannot take advantage of features like role based access control (RBAC) or resource locks at the disk level. Managed disks Managed disks handle storage for you by automatically distributing your disks in storage accounts for capacity and by integrating with Azure Availability Sets to provide isolation for your storage just like availability sets do for virtual machines. Managed disks also makes it easy to change between Standard and Premium storage (HDD to SSD) without the need to write conversion scripts. Azure managed disks currently offers four disk types: of ultra solid-state-drives (SSD) up to 65 TB, premium SSD, standard SSD, and standard hard disk drives (HDD) that supports up to 32 TB Azure File Storage: Azure File Service is a fully managed file share service that offers endpoints for the Server Messaging Block (SMB) protocol, also known as Common internet File System or CIFS 2.1 and 3.0. This allows you to create one or more file shares in the cloud (up to 5 TB per share) and use the share for similar uses as a regular Windows File Server, such as shared storage or for new uses such as part of a lift and shift migration strategy. Auto Scaling An Azure virtual machine scale set can automatically increase or decrease the number of VM instances that run your application based on Autoscale rules. Auto-scale can be configured to make scaling decisions based on Time rule or Schedule to automatically scale the number of VM instances at fixed times. Resource Metric rule (CPU, Memory, disk,…) Custom Metric rule that your application(s) may be emitting. Azure in addition to scale up or down allows sending a notification and invoking a Webhook Azure CycleCloud An enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure. With CycleCloud, users can provision infrastructure for HPC systems, deploy familiar HPC schedulers, and automatically scale the infrastructure to run jobs efficiently at any scale. Billing Model Pay as you go: Pay for compute capacity by the second, with no long-term commitment or upfront payments. Increase or decrease compute capacity on demand. Start or stop at any time and only pay for what you use. Discounts: Spot Instances (low-Priority VM). Low-priority Linux VMs come with 80% discount while the Windows VMs come with 60% discount Reserved Instances up to 72% Dedicated Host depending on your legacy SW licenses reutilizations Other Linux & Windows Support Public and Custom Image Support Snapshot support Start & Termination Script Elastic IP Addresses Update and Fault Domains Azure Stack; hybrid cloud platform that lets you provide Azure services from your datacenter Azure Kubernetes Service (AKS) As AWS, Azure has decided to evolve his Container technology to Kubernetes. In fact, the previous version Azure Container Service (ACS) will be retired on January 31, 2020, and is no longer recommended for new resources. Azure Kubernetes Service (AKS) is a hosted Kubernetes service, Azure handles critical tasks like health monitoring and maintenance for you. The Kubernetes masters are managed by Azure. You only manage and maintain the agent nodes. As a managed Kubernetes service, AKS is free you only pay for the agent nodes within your clusters, not for the masters. A Kubernetes cluster is divided into two components: Cluster master nodes provide the core Kubernetes services and orchestration of application workloads. This cluster master is provided as a managed Azure resource abstracted from the user. Nodes run your application workloads. An AKS cluster has one or more nodes, which is an Azure virtual machine (VM) that runs the Kubernetes node components and container runtime. Azure AKS features: Azure Load-balancing integration. Automatic scaling of your cluster’s node instance count coordinated application upgrades Hybrid Networking Workload Portability, on-premises and cloud Identity and Access Management Integration Logging and Monitoring Azure Container Registry Simplify container development by easily storing and managing container images for Azure deployments in a central registry with the additional features: Geo-replication Fine-grained access control. Existing CI/CD integrations Azure Service Fabric is a distributed systems platform that makes it easy to package, deploy, and manage scalable and reliable microservices and containers. Service Fabric also addresses the significant challenges in developing and managing cloud native applications. Developers and administrators can avoid complex infrastructure problems and focus on implementing mission-critical, demanding workloads that are scalable, reliable, and manageable. Service Fabric is Microsoft’s container orchestrator deploying microservices across a cluster of machines. Microservices can be developed in many ways from using the Service Fabric programming models, ASP.NET Core, to deploying any code of your choice. Azure Container Instances offers the fastest and simplest way to run a container in Azure, without having to manage any virtual machines and without having to adopt a higher-level service. In any case, for scenarios where you need full container orchestration, including service discovery across multiple containers, automatic scaling, and coordinated application upgrades, the best option is Azure Kubernetes Service (AKS). Azure Kubernetes Service (AKS) is a free container service. You pay only for the virtual machines, and associated storage and networking resources consumed. Azure Apps Service, Azure Cloud Services and Azure Batch Azure App Service enables you to build and host web apps, mobile back ends, and RESTful APIs in the programming language of your choice without managing infrastructure with four components: Web Apps; Build and deploy web apps faster at scale Web App for Containers; Deploy and run containerized web apps Mobile Apps; Build mobile apps for any device API Apps Easily build and consume APIs Features: Wide Selection of Application Platforms; Java, .NET, Node.js, PHP, Python and Docker to deploy your web and mobile applications. Auto-scaling High availability Supports both Windows and Linux Enables automated deployments from GitHub, Azure DevOps, or any Git repository Monitoring, Logging, and Tracing Management and Updates Azure Cloud Services is an example of a platform as a service (PaaS). Like Azure App Service, this technology is designed to support applications that are scalable, reliable, and inexpensive to operate. In the same way that App Service is hosted on virtual machines (VMs), so too is Azure Cloud Services. However, you have more control over the VMs. You can install your own software on VMs that use Azure Cloud Services, and you can access them remotely. There are two types of Azure Cloud Services roles. The only difference between the two is how your role is hosted on the VMs: Web role: Automatically deploys and hosts your app through IIS. Worker role: Does not use IIS, and runs your app standalone. Both App Services and Cloud Services provide a lot of good features and are a simple way to deploy your applications to the Microsoft Azure cloud. The primary differentiating factor is Cloud Services offers access to the underlying Azure VMs, and App Services do not. However App Services is more convenient for these specific reasons: Combine multiple applications together to save money Free deployment slots Faster deployments Azure Batch allows run large-scale parallel and high-performance computing (HPC) batch jobs efficiently in Azure. Azure Batch creates and manages a pool of compute nodes (virtual machines), installs the applications you want to run, and schedules jobs to run on the nodes. There is no cluster or job scheduler software to install, manage, or scale. Instead, you use Batch APIs and tools, command-line scripts, or the Azure portal to configure, manage, and monitor your jobs. Features: Support for multi-node parallel jobs Granular job definitions Simple job dependency modeling Support for popular workflow engines Dynamic compute resource provisioning and scaling Priority-based job scheduling Integrated monitoring and logging Fine-grained access control Azure App Service pricing is per hour with a cost depending on the plan; Shared (free), Basic, Standard, Premium and Isolated. Azure Cloud Services pricing is per hour with a cost depending on the VM chosen Azure batch pricing is per hour with a cost depending on the VM chosen. You can also select low priority VM for higher discounts. Azure Spring Cloud (preview) Azure Spring Cloud makes it easy to deploy Spring Boot-based microservice applications to Azure with zero code changes. Spring Cloud provides lifecycle management using comprehensive monitoring and diagnostics, configuration management, service discovery, CI/CD integration, blue-green deployments, and more. Azure Functions Language Runtimes Azure Functions natively supports C#-F# (.NET Framework 4.7 &.NET Core 2.2 &3.1), JavaScript (Node 6,8,10 & 12) Java 8 , Python 3.6, 3.7 & 3.8 and PowerShell Events and Triggers HTTP & Webhooks Blob Storage Cosmos DB Event Grid Event Hubs Microsoft Graph Events Queue storage Service Bus Timer Timeout Function execution time is limited by the timeout duration, which you can specify at function deployment time. A function times out after 5 minutes by default, but you can extend this period up to 10 minutes. When function execution exceeds the timeout, an error status is immediately returned. However with Premium and App Service plan you can have up to 60 minutes of timeout. SLA Monthly Uptime Percentage <= 99.95% Billing Model Azure Functions consumption plan is billed based on per-second resource consumption and executions. Consumption plan pricing includes a monthly free grant of 1 million requests and 400,000 GB-s of resource consumption per month per subscription in pay-as-you-go pricing across all function apps in that subscription. Azure Functions Premium plan provides enhanced performance and is billed on a per second basis based on the number of vCPU-s and GB-s your Premium Functions consume. Customers can also run Functions within their App Service plan at regular App Service plan rates. Alibaba Cloud Compute Services Alibaba Elastic Compute Service (ECS) Technology Transition from Xen to KVM since 2014 ECS Bare Metal Instance have a custom Hypervisor with nested virtualization SLA Monthly Uptime Percentage to Customer of at least 99.95% Machine Types Selection of instance types (families) optimized to fit different use cases Up to 208 VCPU & 3,8 TB Memory. GPU Up to 8 GPU & 256 GB of GPU Memory Machine Options Dedicated Instances Pay as you go – A postpaid method in which you can pay after using the instance. Instance usage is billed on a minute basis, and the billing unit is US$/hour. Reserved Virtual Machine Instances (subscription) – A prepaid method that allows you to use an instance only after you make the payment for it. Instance usage is billed on a monthly basis, and the billing unit is USD/month. Spot (Preemptible instances) you can set a maximum price per hour to bid for a specified instance type. If your bid is higher than or equal to the current market price, your instance is created and billed according to the current market price. You can hold a preemptible instance without interruption for at least one hour. After one hour, your bid is compared with the market price. When the market price exceeds your bid or the resource stock is insufficient, the instance is automatically released. Dedicated Hosts – Dedicated Host (DDH) is a host service that allows a tenant to use dedicated hardware resources based on Alibaba Cloud virtual hosting services. This service enables enterprises to achieve custom deployment, bring your own license (BYOL), and security and regulation compliance. DDH supports multiple types of ECS instances. Disks (Block & File Devices) Alibaba Disk Storage Cloud disks that can be attached to only one ECS instance in the same zone of the same region. System disks: have the same life cycle as the ECS instance to which it is mounted. A system disk is created and released at the same time as the instance. Shared access is not allowed. Up to 500GB. Data disks: can be created separately or at the same time as ECS instances. A data disk created with an ECS instance has the same life cycle as the instance, and is created and released along with the instance. Data disks created separately can be released independently or at the same time as the corresponding ECS instances. Shared access is not allowed. Performance-based category; ESSD, SSD, Ultra Cloud Disks and Basic Cloud disks up to 32TB per disk. Shared Block Storage is a block-level data storage service with strong concurrency, high performance, and high reliability. It supports concurrent reads from and writes to multiple ECS instances. Shared Block Storage can be mounted to a maximum of 8 ECS instances. SSD and Ultra Cloud Disks up to 32TB per disk. Local disks are the disks attached to the physical servers (host machines) on which ECS instances are hosted. They are designed for business scenarios requiring high storage I/O performance. Local disks provide local storage and access for instances, and feature low latency, high random IOPS, high throughput, and cost-effective performance. SSD up to 8x 1,788GB GB and SATA HDD up to 154TB Alibaba NAS A storage space designed to store massive amounts of unstructured data that can be accessed by using standard file access protocols , such as the Network File System (NFS) protocol for Linux, and the Common Internet File System (CIFS) protocol for Windows. You can set permissions to allow different clients to access the same file at the same time. NAS is suitable for business scenarios such as file sharing across departments, non-linear file editing, high-performance computing, and containerization (such as with Docker). Support Petabytes of data Auto Scaling Auto Scaling automatically adjusts the volume of your elastic computing resources to meet your changing business needs. Based on the scaling rules that you set, Auto Scaling automatically adds ECS instances as your business needs grow to ensure that you have sufficient computing capabilities. When your business needs fall, Auto Scaling automatically reduces the number of ECS instances to save on costs. Auto Scaling provides a health check function and automatically monitors the health of ECS instances within scaling groups, so the number of healthy ECS instances in a scaling group does not fall below the minimum value that you set. Billing Model Pay as tou go: Instance usage is billed on a minute basis, and the billing unit is US$/hour. Discounts: Spot Instances (Preemptible instances). Depending of the bid price; the maximum hourly price you are willing to pay. Greater than 60% (around 80%) Reserved Virtual Machine Instances (monthly subscription) up to 60% Dedicated Host depending on your legacy SW licenses reutilizations Other Linux & Windows Support Public and Custom Image Support Snapshot support Start & Termination Script Elastic IP Addresses Cloud migration tool Container Service, Container Service for Kubernetes & Elastic Container Instance (ECI) Container Service for Kubernetes provides the high-performance and scalable container application management service, which enables you to manage the lifecycle of enterprise-class containerized applications by using Kubernetes. By simplifying the setup and capability expansion of cluster and integrating with the Alibaba Cloud abilities of virtualization, storage, network, and security, Container Service for Kubernetes makes an ideal running cloud environment for Kubernetes containers with two modes: Classic dedicated Kubernetes mode: You can get more fine-grained control over cluster infrastructure and container applications, for example, select the host instance specification and the operating system, specify Kubernetes version, custom Kubernetes attribute switch settings, and more. Alibaba Cloud Container Service for Kubernetes is responsible for creating the underlying cloud resources for the cluster, upgrading and other automated operations for the cluster. You need to plan, maintain, and upgrade the server cluster. You can add servers to or remove servers from the cluster manually or automatically. Serverless Kubernetes mode: You do not need to create the underlying virtualization resource. To launch the application directly, use Kubernetes commands to specify the application container image, CPU and memory requirements as well as external service methods. Dedicated Kubernetes cluster You must create three Master nodes and one or multiple Worker nodes for the cluster. In addition, you need to plan, maintain, and upgrade the cluster as needed. With such a Kubernetes cluster, you can control cluster infrastructures in a more fine-grained manner. Managed Kubernetes cluster You only need to create Worker nodes for the cluster, and Alibaba Cloud Container Service for Kubernetes creates and manages Master nodes for the cluster. This type of Kubernetes cluster is easy to use, low-cost, and highly available. You can focus on the services supported by the cluster without needing to operate and maintain the Kubernetes cluster Master nodes. Serverless Kubernetes cluster You do not need to create or manage any Master nodes or Worker nodes for the cluster. You can directly use the Container Service console or the command line interface to set container resources, specify container images for applications, set methods to provide services, and start applications. Alibaba Container Service for Kubernetes features: Alibaba Load-balancing integration. Automatic scaling of your cluster’s node instance count Hybrid Networking Workload Portability, on-premises and cloud Identity and Access Management Integration Logging and Monitoring Container Registry allows you to manage images throughout the image lifecycle. It provides secure image management, stable image build creation across global regions, and easy image permission management. This service simplifies the creation and maintenance of the image registry and supports image management in multiple regions. Combined with other cloud services such as Container Service, Container Registry provides an optimized solution for using Docker in the cloud. Alibaba Container Service provides the high-performance and scalable container application management service, which enables you to manage the lifecycle of containerized applications by using Docker and Kubernetes. Container Service provides multiple application release methods and the continuous delivery ability, and supports microservice architecture. By simplifying the setup of container cluster and integrating with the Alibaba Cloud abilities of virtualization, storage, network, and security, Container Service makes an ideal running cloud environment for containers. Elastic Container Instance (ECI) is an agile and secure serverless container instance service. You can easily run containers without managing servers. Also you only pay for the resources that have been consumed by the containers. ECI helps you focus on your business applications instead of managing infrastructure. You can quickly and easily deploy containers to the cloud through a two-step procedure. An ECI container group is similar in concept to a pod in Kubernetes. In any case, for scenarios where you need full container orchestration, including service discovery across multiple containers, automatic scaling, and coordinated application upgrades, the best option is Container Service for Kubernetes. Container Service is currently free of charge. Resources used in collaboration with Container Service (including Server Load Balancer and ECS) are charged separately. For ECI you incur charges based on the number of Elastic Container Instances (ECI) you use. Simple Application Server & Batch Compute Simple Application Server suits you well if all you need is a private virtual machine. It provides you the all-in-one solution to launch and manage your application, set up domain name resolution, and build, monitor, maintain your website with just a few clicks. It makes private server building much easier, and it is the best way for beginners to get started with cloud computing. Scenarios of the Simple Application Server: Building a small-sized website Building a personal blog Building a forum/community Building a knowledge or efficiency management tool Building a personal learning environment Building a small E-commerce website Building a development environment Batch Compute is a distributed cloud service suitable for processing massive volumes of data concurrently. Batch Compute supports massive concurrent jobs. The system automates resource management, job scheduling, and data loading and supports billing on a Pay-As-You-Go basis. In terms of nonprofessional, Batch Compute allows you to submit any computing program to be run on multiple Alibaba Cloud virtual machine (VM) instances. Then, the results are written to a specified persistent storage location (such as Alibaba Cloud OSS or NAS) where you can view them. Features: Support for multi-node parallel jobs Granular job definitions Job scheduling Dynamic compute resource provisioning and scaling Integrated monitoring and logging Fine-grained access control Simple Application Server provides a monthly package of resources at a fixed charge and currently supports monthly and yearly pre-payment payment methods. With Batch Compute, you pay for the compute and storage resources consumed by your jobs or clusters. There is no additional charge on resource management and job scheduling services. Functions Compute Language Runtimes Java 8, Node.js 6 & 8, PHP 7.2 and Python 2.7 & 3.6 Events and Triggers HTTP Alibaba Cloud Object Storage Service (OSS) CDN events Timer MNS topic Table Store Log Service Timeout The default function timeout is 3 seconds. Function timeout can be set with any value between 1 and 600 seconds.. SLA Monthly Uptime Percentage <= 99.95% Billing Model Alibaba Cloud Function Compute is billed on a Pay-As-You-Go basis. The fee consists of three parts: The total number of function calls Execution duration starts when your codes begin to be run and end when the result is returned or execution is terminated. The measurement granularity is 100 milliseconds. The duration price depends on the memory size that you have allocated to functions. Public Network Traffic Consumption plan pricing includes a monthly free grant of 1 million requests and 400,000 GB-s of resource consumption per month. Public Cloud Compute Services: IaaS Comparison AWS GCP Azure Alibaba Virtualization Technology Xen KVM Customized version of Hyper-V Transition from Xen to KVM since 2014 Nested Virtualization Partial in i3.metal instance Nested virtualization can only be enabled for L1 VMs running on Haswell processors or later (KVM & linux) Yes Linux and Windows Yes in ECS Bare Metal Instance SLA (Monthly Uptime Percentage to Customer) 99.99% 99.99% 99.99% 99.99% Machine Types and Sizes Up to 96 VCPU & 768 GB Memory Up to 16 GPU & 54 GB of GPU Memory Up to 416 VCPU & 1.4 TB Memory (mega-memory machine types) Up to 8 GPU & 96 GB of GPU Memory Up to 416 VCPU & 11.4 GB Memory Up to 8 GPU & 96 GB of GPU Memory Up to 208 VCPU & 3.8 TB Memory Up to 8 GPU & 256 GB of GPU Memory Machine Options Dedicated Instances. – On Demand Instances (seconds & hourly) – Reserved Instances (1-3 years) – Spot Instances Dedicated Hosts – On Demand Instances (seconds & hourly) – Reserved Instances (1-3 years) On Premises – AWS Outposts that allows run AWS infrastructure and services on premises Dedicated Instances. – On Demand Instances (seconds) – Reserved Instances (1-3 years) – Spot Instances (Preemptible VM) – Shielded VM Dedicated Hosts (Sole-tenant nodes) – On Demand Instances (seconds & hourly) – Reserved Instances (1-3 years) On Premises – Anthos GKE on-prem (GKE on-prem) is hybrid cloud software that brings GKE to on-premises data centers Dedicated Instances. – On Demand Instances (seconds & hourly) – Reserved Instances (1-3 years) – Spot Instances (low-Priority VM) Dedicated Hosts (Isolated VM) – On Demand Instances (seconds & hourly) – Reserved Instances (1-3 years) On Premises – Azure Stack Portfolio is an extension of Azure to consistently build and run hybrid applications across datacenters, edge locations, remote offices, and cloud Dedicated Instances. – On Demand Instances (minutes) – Reserved Instances (Monthly) – Spot Instances (Preemptible VM) Dedicated Hosts – Reserved Instances (Monthly) Disks (Block & File Devices) Instance store volumes attached to the host computer for the instance SSD (up to 60 TB) and Magnetic (up to 48TB) Elastic Block Storage (EBS) attached to any running instance that is in the same Availability Zone. SSD (up to 16 TB) and Magnetic (up to 16TB) Cloud File Storage allow access to data through shared file systems (petabytes of data) Local Disk attached to the host computer for the instance SSD (up to 9 TB) Persistence Disk attached to any running instance that is in the same Zone o Region. SSD (up to 64 TB) and Magnetic (up to 64TB) with the option to achieve 257 TB File server allow access to data through shared file systems (petabytes of data) RAM disks share instance memory (use the ram memory of the instance) Azure Disk Storage virtual hard disk (VHD) attached to the host computer for the instance. ultra solid-state-drives (SSD) (preview) up to 65 TB, premium SSD, standard SSD, and standard hard disk drives (HDD) that supports up to 32 TB Azure File Storage allow access to data through shared file systems (5TB per share) Cloud Disk attached to the host computer for the instance ESSD, SSD, Ultra Cloud Disks and Basic Cloud disks up to 32TB per disk. Shared Block Storage attached to any running instance that is in the same Availability Zone. SSD and Ultra Cloud Disks up to 32TB per disk Local disks are the disks attached to the physical servers (host machines) on which ECS instances are hosted. SSD up to 8×1.78 TB and SATA HDD up to 154TB Alibaba NAS allow access to data through shared file systems (petabytes of data) Autoscaling Scaling options – Manual – Schedule – Dynamic policies – Monitoring policies Cooldowns support Shutdown script Health check support Removal Policy Scaling options – Dynamic policies – Monitoring policies Cooldowns support Shutdown script Health check support Scaling options – Manual – Schedule – Dynamic policies – Monitoring policies – Application policies Cooldowns support Shutdown script (preview) Health check support Notification & webhooks support Scaling options – Manual – Schedule – Dynamic policies – Monitoring policies Cooldowns support Shutdown script Health check support Removal Policy Billing Model On Demand: Pricing is per instance-hour (Each partial instance-hour consumed will be billed per-second for Linux Instances and as a full hour for all other instance types) Discounts: – Spot Instances up to 90% – Reserved Instances (1-3 Years) up to 75% On Demand: Pricing is per instance-second (minimum 1 minute) Discounts: – Spot (Pre-emptible) Instances up to 80% – Reserved Instances (1-3 Years) up to 70% – Sustained use discounts (When an instance uses a vCPU for more than 25% of a month) up to 30% On Demand: Pricing is per instance-second Discounts: – Spot (low-Priority VM) Instances up to 80% Linux and 60% Windows – Reserved Instances (1-3 Years) up to 72% On Demand: Pricing is per instance-minute Discounts: – Spot (Pre-emptible) up to 60%-80% – Reserved Instances (monthly) up to 60% Other – Linux & Windows Support – Public and Custom Image Support – Snapshot support – Migration tool & methodology – Lightweight version (lightsail) – VMware Cloud on AWS – Parallel Cluster management based on Opensource – Linux & Windows Support – Public and Custom Image Support – Snapshot support – Migration tool & methodology – Managed and unmanaged Instance Groups Support – Linux & Windows Support – Public and Custom Image Support – Snapshot support – Migration tool & methodology – Update and Fault Domains – Azure CycleCloud An enterprise-friendly tool for orchestrating and managing High Performance Computing (HPC) environments on Azure – Linux & Windows Support – Public and Custom Image Support – Snapshot support – Cloud migration tool Public Cloud Compute Services: CaaS Comparison AWS GCP Azure Alibaba Custom Container Service EC2 Container Service (ECS) Azure Container Service (ACS) that will be retired on January 31, 2020 Alibaba Container Service Kubernetes Container Service Elastic Container Service for Kubernetes (EKS) -AWS Load-balancing integration. -Automatic scaling of your cluster’s node instance count -Automatic upgrades for your cluster’s node software -Hybrid Networking -Workload Portability, on-premises and cloud -Identity and Access Management Integration -Logging and Monitoring Google Kubernetes Engine (GKE) -GCP load-balancing integration – Node pools to designate subsets of nodes within a cluster for additional flexibility -Multi-zone Clusters or Regional Clusters -Automatic scaling of your cluster’s node instance count -Automatic upgrades for your cluster’s node software -Node auto-repair to maintain node health and availability -Hybrid Networking -Workload Portability, on-premises and cloud -Dashboard for GKE clusters and their resources. -Identity and Access Management Integration -Logging and Monitoring Azure Kubernetes Service (AKS) –Azure Load-balancing integration. -Automatic scaling of your cluster’s node instance count -coordinated application upgrades -Hybrid Networking -Workload Portability, on-premises and cloud -Identity and Access Management Integration -Logging and Monitoring Alibaba Container Service for Kubernetes with 3 options: – Dedicated Kubernetes cluster – Managed Kubernetes cluster – Serverless Kubernetes cluster –Alibaba Load-balancing integration. -Automatic scaling of your cluster’s node instance count -Hybrid Networking -Workload Portability, on-premises and cloud -Identity and Access Management Integration -Logging and Monitoring Registry service Amazon ECR Registry – Fine-grained access control. -Existing CI/CD integrations Google Container Registry – Perform vulnerability analysis – Fine-grained access control. -Existing CI/CD integrations Azure Container Registry – Geo-replication – Fine-grained access control. -Existing CI/CD integrations Alibaba Container Registry Billing Model – Amazon EKS cluster (per hour) – AWS resources you create to run your Kubernetes worker nodes. – Node instances according to VM, Storage and Network pricing – Node instances according to VM, Storage and Network pricing – Node instances according to VM, Storage and Network pricing Other Services Google Cloud Build to executes your builds on Google Cloud Platform’s infrastructure. GKE on premises with Anthos GKE on-prem that brings Google Kubernetes Engine (GKE) to on-premises data centers. Knative provides an open API and runtime environment that enables you to run your serverless workloads anywhere you choose Service Fabric Microsoft’s container orchestrator deploying microservices across a cluster of machines. Microservices can be developed in many ways from using the Service Fabric programming models, ASP.NET Core, to deploying any code of your choice. Azure Container Instances (ACI) offers the fastest and simplest way to run a container in Azure, without having to manage any virtual machines and without having to adopt a higher-level service. Elastic Container Instance (ECI) is an agile and secure serverless container instance service. You can easily run containers without managing servers. Public Cloud Compute Services: AaaS Comparison AWS GCP Azure Alibaba Web Apps AWS Elastic Beanstalk -Wide Selection of Application Platforms; Java, .NET, Node.js, PHP, Ruby, Python, Go, and Docker to deploy your web applications. -Variety of Application Deployment Option (Visual Studio and Eclipse) -Monitoring, Logging, and Tracing -Management and Updates -Scaling -AWS Resources Customization Google App Engine (Standard and Flexible environment) – Fully managed serverless application platform -Wide Selection of Application Platforms; Java, PHP, Node.js, Python, C#, .Net, Ruby and Go and Docker to deploy your web applications. -Variety of Application Deployment Option (Cloud Source Repositories, IntelliJ IDEA, Visual Studio) -Monitoring, Logging, and Diagnostics -Application Versioning -Scaling -GCP Resources Customization – Traffic Splitting – Application Security Azure App Service –Wide Selection of Application Platforms; Java, .NET, Node.js, PHP, Python and Docker to deploy your web and mobile applications. -Auto-scaling -High availability -Supports both Windows and Linux -Enables automated deployments from GitHub, Azure DevOps, or any Git repository -Monitoring, Logging, and Tracing -Management and Updates Cloud Services Offers access to the underlying Azure VMs Azure Spring Cloud (preview) Azure Spring Cloud makes it easy to deploy Spring Boot-based microservice applications to Azure with zero code changes Simple Application Server It provides you the all-in-one solution to launch and manage your application, set up domain name resolution, and build, monitor, maintain your website with just a few clicks. Focus on beginners to get started with cloud computing. Batch Apps AWS Batch –Support for multi-node parallel jobs -Granular job definitions -Simple job dependency modeling -Support for popular workflow engines -Dynamic compute resource provisioning and scaling -Priority-based job scheduling -Dynamic spot bidding -Integrated monitoring and logging -Fine-grained access control App Engine Cron Service (basic batch only scheduling tasks) Batch on GKE A cloud-native solution for scheduling and managing batch workloads. With Batch, you can leverage the on-demand and flexible nature of cloud. Batch is based on Kubernetes and containers so your jobs are portable. Cloud Dataflow – Based on Apache Beam (java & Python) -Automated Resource Management -Dynamic Work Rebalancing -Horizontal Auto-scaling Azure Batch –Support for multi-node parallel jobs -Granular job definitions -Simple job dependency modeling -Support for popular workflow engines -Dynamic compute resource provisioning and scaling -Priority-based job scheduling -Integrated monitoring and logging -Fine-grained access control Batch Compute –Support for multi-node parallel jobs -Granular job definitions -Job scheduling -Dynamic compute resource provisioning and scaling -Integrated monitoring and logging -Fine-grained access control Billing Model You pay only for AWS resources (e.g. EC2 instances or S3 buckets) you create to store and run your application App Engine flexible you pay only for the resources allocated App Engine standard environment are deployed to instance classes that you specify, that have a cost per hour per instance. Cloud Dataflow service usage is billed in per second increments, on a per job basis. Azure App Service pricing is per hour with a cost depending on the plan. Azure Cloud Services pricing is per hour with a cost depending on the VM chosen Azure batch pricing is per hour with a cost depending on the VM chosen. You can also select low priority VM for higher discounts. Simple Application Server provides a monthly package of resources at a fixed charge and currently supports monthly and yearly pre-payment payment methods. Batch Compute, you pay for the compute and storage resources consumed by your jobs or clusters. Public Cloud Compute Services: FaaS Comparison AWS GCP Azure Alibaba Language Runtimes – JavaScript (Node.js 12 & 10) – Python 3.8-3.6-3.7-2.7, – Go (1.x) – Java 8-11 – PowerShell – C# (.Net Core 3.1 and 2.1) – Ruby 2.7-2.5 – JavaScript (Node.js 6-8-10) – Python (3.7.6) – Go (1.11 and 1.13(beta)) – JavaScript (Node.js 6,8,10&12) – Python 3.6-3.7-3.8 – Java 8 – C#-F# (.NET Framework 4.7 &.NET Core 2.2 & 3.1) – JavaScript (Node.js 6 & 8) – Python 2.7 & 3.6 – Java 8 – PHP 7.2 SLA (Monthly Uptime Percentage to Customer) <= 99.95% <= 99.5% <= 99.95% <= 99.95% Events and Triggers – HTTP— HTTP requests. – Amazon S3 – Amazon DynamoDB – Amazon Kinesis Data Streams – Amazon Simple Notification Service – Amazon Simple Email Service – Amazon Simple Queue Service – Amazon Cognito – AWS CloudFormation – Amazon CloudWatch Logs – Amazon CloudWatch Events – AWS CodeCommit – Scheduled Events (powered by Amazon CloudWatch Events) – AWS Config – Amazon Alexa – Amazon Lex – Amazon API Gateway – AWS IoT Button – Amazon CloudFront – Amazon Kinesis Data Firehose – Other Event Sources: Invoking a Lambda Function On Demand – HTTP— HTTP requests. – Cloud Storage – Cloud Pub/Sub – Cloud Firestore -Firebase (Realtime Database, Storage, Analytics, Auth) –Stackdriver Logging—forward log entries to a Pub/Sub topic by creating a sink. You can then trigger the function – HTTP & Webhooks – Blob Storage – Cosmos DB – Event Grid – Event Hubs – Microsoft Graph Events – Queue storage – Service Bus – Timer – HTTP— HTTP requests. – Alibaba Cloud Object Storage Service (OSS) – CDN events – Timer – MNS topic – Table Store – Log Service Timeout Default 3 Seconds. Up to 15 Minutes Default 1 Minute. Up to 9 Minutes Default 5 Minute. Up to 10 Minutes. (with Premium and App Service plan you can have up to 60 minutes of timeout) Default 3 Seconds Up to 10 Minutes Billing Model Number of requests + Execution time + Memory allocated + Networking. Outbound data transfer 1M free requests per month and 400,000 GB-seconds of compute time per month Number of requests + Compute time + Memory allocated + Networking. Outbound data transfer 2M free requests per month regardless duration Number of requests + Execution time + Memory allocated + Networking. Outbound data transfer 1M free requests per month and 400,000 GB-seconds of compute time per month Customers can also run Functions within their App Service plan at regular App Service plan rates Number of requests + Execution time + Memory allocated + Public Network Traffic 1M free requests per month and 400,000 GB-seconds of compute time per month Conclusion The cloud computing services offered by AWS, GCP, Azure and Alibaba can be analyzed from two perspectives; From the point of view of the services offered in each layer From the point of view of the developer who has to create an application in the cloud. From the point of view of the evolution during 2019-2020 From the point of view of the layers offered Iaas Currently the four Vendors analyzed offer very similar services with price and SLA models also almost identical. The decision to choose a vendor depends on factors such as; The presence of the vendor in your country and the level of commitment when making discounts. In this aspect Azure seems to have more freedom when it comes to adapting to the needs of a client. The need to use other services of the provider. For example, if a Big Data and Analytics service is required, GCP is probably the best option. If a full stack cloud service platform is required, AWS could be the best option. If your platform is Microsoft, Azure should be the first vendor to evaluate. If your market is mainly China, Alibaba could be an option. The presence of Data Centres from the vendor in your country that allows low-latency hybrid solutions. The level of support of hybrid approach that clearly is the 2019-2010 big trend Probably in IaaS environment it may be convenient to have two suppliers to be able to contrast services and price. CaaS CaaS in my opinion is the future of cloud computing services and more specifically Kubernetes that allows you a portability of your solution to other vendor and the ability to define hybrids cluster. In this aspect, GCP has the lead since it was the precursor of Kubernetes. However, both AWS, Azure and Alibaba have rotated quickly to include Kubernetes as their star solution for CaaS. This is where big changes are taking place (as in fact has happened in 2019); Offer Kubernetes in Serverless mode Expand the registry services to cover the full application life cycle management Include a Microservices Architecture Include Batch services (already cover by GKE) Although GCP is the one that has more experience about Kubernetes, we must be aware to the movements of the rest of the vendors who have understood the relevance of offering CaaS over Kubernetes. AaaS AaaS (Application as a Service) was an attempt by the vendors to offer a simplified Web development. However, it will generate a Lock-in with the vendor that is currently not acceptable if we want to ensure the future portability of our applications. The trend is that under CaaS with Kubernetes begins to offer serverless models and microservices architecture that does not tie you to the provider. Under AaaS there are also batch solutions, where the scheduler and job management model is very similar, but in my opinion, a more portable solution under Kubernetes will be offered soon. So in summary, try to avoid the AaaS offering by the vendors (with the exception of Batch solutions that have a low lock-in and there is not alternatives in other layers) FaaS Regarding FaaS, like GCP with Kubernetes, AWS was the precursor of the serverless model of functions offering the most robust and integrated solution. The rest of the providers quickly included this capacity with a similar approach. The good news is that with a minimal architecture layer it is possible to develop easily portable functions among vendors. The bad news is that the functions are not for a general purpose and applies only to use cases that do not require a guaranteed latency (although it is possible that some vendor with an additional cost could guarantees latency) and do not contain long-term processes. From the point of view of the developer who has to create an application in the cloud. If I were a developer, my bet would be clearly towards a CaaS docker model orchestrated by Kubernetes complemented with FaaS for streaming processing and some simple microservices. The registry Service offer by the vendors is a good starting point but it is necessary to strengthen the life cycle management with products like: Helm (Package manager for Kubernetes) Spinnaker (Continuous Delivery Platform aligned with Kubernetes) Jenkins (Continuous Integration Platform) In this area GCP are defining Devops model for Kubernetes that could be also a reference. Finally during 2019 GCP provides a batch solution over GKE that covers the big gap that had previously. From the the point of view of evolution during 2019-2020 The main evolutions of the computer service of the four providers during 2019-2020 have focused on: Provide hybrid cloud capabilities. Here the forerunner was Azure that has expanded the reach and followed by AWS with its service of AWS Outposts and GCP with its solution of Kubernetes Anthos GKE on-premises Allow High Performance Computing (HPC) Configurations with parallel cluster and management tools Add serverless capabilities to container solutions Implement Microservices alternatives such as support for Spring Boot-based microservice applications in Azure Improve the development life cycle of CaaS and FaaS solutions Cover necessary gaps such as Batch support in GCP... Read more...Advantages and disadvantages of moving workloads to a Public CloudTable of Contents Advantages of Public CloudDisadvantages of Public CloudNo requirement of IT InvestmentPay per UseElasticity/ScalabilitySelf Management-AutomationGlobal DeploymentReliability CostSecurity & ComplaintSLA responsibilityVendor Lock-inLatency in Hybrid models Advantages of Public Cloud No requirement of IT investment Pay per Use Elasticity/Scalability Key services out of the box; increase speed deployment & Agility Self Management-Automation Global Deployment Reliability Cost Security & Complaint Ecosystem of additional services Disadvantages of Public Cloud Cost Security & Compliant SLA responsibility Vendor Lock-in Latency in Hybrid models Let’s see in detail each characteristic of the public Cloud and when it is an advantage or inconvenience. No requirement of IT Investment It is one of the clearest advantages of Public Cloud , along with the payment per use. This capacity is what allows new businesses to take off without the need for heavy investments and the possibility of carrying out multiple tests and errors without financial consequences.The saving of IT investments not only covers the HW, SW, Network and Security, but also the need for a CPD and all the expenses involved. Pay per Use The payment for use is one of the main characteristics of the Public Cloud that allows to pay exclusively for the use of a resource (time or requests) while it is active and to stop paying when it is inactive without any commitment of permanence.The concept of pay-per-use has many nuances in the Public Cloud and it is fundamental to understand them. For example while executing a VM instance a charge is made for hours or minutes and when the instance is deleted the charge stops occurring. But there is also the option to stop the instance without deleting it where the charge would also stop (except for the reserved disk).Another example is the functions where the concept of payment for use is mapped to the number of invocations regardless of the time that has passed between them.Pay per Use is an advantage for variable workload, however, as we are going to see in the cost feature, for steady workload other options are better. Elasticity/Scalability Elasticity/Scalability is one of the best known features of the cloud public, but it has a trick. We are talking mainly about horizontal Elasticity/Scalability. The vertical scalability provided by the Public Cloud is limited and far from the high-end servers and mainframes of traditional companies. In addition in most of the Public Cloud the vertical elasticity is also limited, forcing you to relaunch the VM Instance to increases/change the assigned CPU. Therefore, in order to make use of the much-proclaimed Elasticity/Scalability of Public Cloud, it is necessary that your applications are designed to be able to scale horizontally; micro services, NoSQL, stateless…. Key services out of the box; increase speed deployment & AgilityThe Public Cloud offers out of the box all the services necessary to deploy business applications: Compute, Networking, Storage and Databases, Middleware, Management & Development tools, Identity & Security, Big Data, Machine Learning, …In addition, together with these services, they offer specific patterns for each use case and industry, which greatly accelerates the deployment of applications. Self Management-Automation By definition, the Public Cloud is based on the Software defines approach, which implies a high level of automation. All the configuration of a solution in the cloud is defined in a parametrized way and tools are offered to deploy all the resources required by your application in a declarative format (based on templates to programmatically control what gets deployed). This is a clear advantage over legacy CPD.In addition all the cloud services and resources are integrated with a common Monitoring, Logging and Error Reporting system. Global Deployment The Public Cloud allows applications to be deployed globally by replicating solution configurations in the corresponding regions in a quick and economical way.They also offer storage service that is replicated automatically between regions allowing users to access content in an efficient manner. Finally they also offer a Global Content Delivery Network (CDN) globally distributed edge points around the world to accelerate content delivery for websites and applications. Reliability The Public Cloud allows the deployment of low cost Disaster Recovery solutions. Since it is possible to define a complete solution configuration by SW, it is not necessary to have a replica of the CPD waiting for a problem.For those cases where fast high availability is required, all Public Clouds offer a global balancing system that allows redistributing the load between different zones and regions.In general Cloud disaster recovery systems can be deployed much more quickly and with better control over your resources. Cost The cost of Public Clouds is another topic of debate. There are certain use cases in which the cost of a Public Cloud is clearly more optimal, but in other cases they require a more detailed analysis.The optimal uses cases in terms of costs to move workload to the Public Cloud are: Those that require a new investment in infrastructure Fluctuating work load Global workload For 24×7 stable workloads, further analysis is necessary. In fact, the pay-per-use model is usually not the best option and you have to change to a subscription model with a commitment to use where discounts can reach 70%.In any case, the management of costs in a Public Cloud is completely different from a traditional environment, so it is necessary to assign a specific staff to monitor invoices and constantly identify options for improvement in costs by volume or billing model. In addition, it is required an exhaustive control of resources not used to be eliminated from invoicing. Security & Complaint Security is an area in which all Public Cloud are focusing.Currently the Public Cloud compliance and security level is much higher than a traditional CPD, but unfortunately there are still certain regulatory aspects in each country that require additional and specific approval by the local regulatory Entity. In addition the Public Cloud provider manages security of the cloud. Security in the cloud is the responsibility of the customer. That means that you need to install and configure additional layers of security. Ecosystem of additional servicesHaving access to the entire ecosystem of cloud services is one of the main advantages of cloud applications. In addition this ecosystem is constantly growing and improving incorporating new trends (IoT, AI, …). SLA responsibility SLA responsibility is another controversial issue in the Public Cloud. Although service levels are well defined, the counterpart is not clear in case of an impact on the company. In fact, in the last cases of breach of service level the lawyers of the Public Cloud managed to soften any type of compensation. Vendor Lock-in Vendor Lock-in is a risk when moving to a specific Cloud. However, with a good architecture that isolates applications from the dependencies of each cloud platform, it is possible to reduce it. In fact, using containers and functions together with an Architecture layer to reduce dependencies should reduce the lock-in. You should avoid what the vendors call Apps as a Service, and in addition the use of multi-cloud database (like Mongo DB) also reduces the level of lock-in at database layer. Finally another area to reduce the lock-in is the deployment tool and language used for provision/config all the infrastructure resources in your cloud environment. Again, try to use standard solutions (like Cheff or Puppet) and build a layer to isolate dependencies. Latency in Hybrid models Latency in hybrid models is an aspect to consider when it comes to migration to Public Cloud based on an on-premise solution.This Latency can be mitigated with dedicated communication lines and by choosing autonomous workloads with low dependence on legacy systems.Additionally, if one of the areas of the public provider is fortunately in the same geographical area as the on-premise CPD, it would be possible to establish low-latency communications. ... Read more...