A Resource and Policy Aware VM Scheduler for Medium Scale Clouds
Cloud computing enables providing computing resources over an Internet connection to the users. In an IaaS (Infrastructure as a Service) Cloud, these computing resources include Processing power, Memory, Storage, and Network resources. Cloud computing technologies including Infrastructure as a Service, Platform as a Service, and Software as a Service have changed the traditional “host on own data center” strategy and have given a good solution to prevent maintenance overhead of a private data center per organization. Medium-scale IaaS clouds are becoming more popular in small/medium scale organizations such as universities and enterprises. Medium-scale IaaS clouds are useful when organizations need to deploy their own private cloud using the computer, storage and network resources they have. There are several IaaS platforms such as OpenStack, CloudStack, and Eucalyptus which users can use to deploy their own medium-scale clouds. These tools handle sharing and management of computing, storage and network resources dedicated to the cloud and perform resource allocation for various requirements.
Many universities and enterprises are now setting up their own small-to-medium scale private clouds. Such private cloud is becoming popular, as they provide the ability to multiplex existing computing and storage resources within an organization while supporting diverse applications and platforms. It also provides a better performance, control, and privacy. In a medium-scale cloud such as a university or enterprise cloud, there are different types of users including students, lecturers, internal/external researchers, and developers, who get benefits from the cloud in different ways. They may have varying requirements in terms of resources, priorities, and allocation periods for the resources. These different requirements may include processing intensive and memory intensive applications such as HPC (High-Performance Computing) applications and data mining application, as well as labs which need to be deployed on a specific set of hosts and for a particular period of time. Priority schemes and Dynamic VM (Virtual Machine) migration schemes should be used to satisfy all these requirements in an organized manner. However, currently known IaaS cloud platforms have no native capability to perform such dynamic resource allocations and VM preemption mechanisms. Therefore, it is important to extend existing cloud platforms to provide such policy, resource, and deadline aware VM scheduling.
What is the solution?
We are proposing a resource scheduling mechanism which can be used as an extension to an existing IaaS cloud platform to support dynamic resource and policy-aware VM scheduling for Medium Scale Clouds. Resource scheduling algorithm schedules VMs while being aware of the capabilities of the cloud hosts and current resource usage, which is monitored continuously using a resource monitor. The resource scheduler will allocate resources according to predefined priority levels of a particular user who issued the resource request. Time-based scheduling (e.g., deploying labs for a particular period of time) is also performed while considering the priority levels and existing resource allocations. To provide these features we extend an existing IaaS cloud platform to include resource and policy-aware VM creation, migration, and preemption.
Apache CloudStack is an open source IaaS platform widely being used for building medium scale clouds. CloudStack is easier to setup compared to alternatives such as OpenStack and Eucalyptus because of its monolithic architecture. CloudStack management server can be installed on a single machine easily whereas OpenStack has a different component which needs to be separately installed and configured which required expertise. CloudStack provides a rich web UI and also simpler RESTful API for 3rd party tools integration. And also it implements an Amazon EC2 API compatible interface as well.
KVM hypervisor is an open source kernel-based virtualization scheme which can be used to create virtual machines on top of a shared physical host. The reason for our interest in KVM to be used with CloudStack was KVM provides the capability to take snapshots of a running VM including its disk and memory using libvirt API. Memory snapshot is not provided by other supported open source hypervisors of CloudStack such as XCP (Xen Cloud Platform) though XenServer, which is the commercial version of Xen provides VM memory snapshots. Taking VM snapshots is a requirement in our solution because we need to save the state of a VM and restore it later. This is called preemptive scheduling in which high priority request will preempt a low priority request if there are lack of resources on the cloud.
Zabbix Resource Monitoring System
Zabbix is a free and open source resource monitoring system for clouds and grids. It is highly scalable and can be used for resource monitoring of small to large scale clouds consists of up to 100,000 monitored devices. Zabbix also provides a RESTful API and a good web UI which can be used to monitor resource utilization and availability of a set of monitored devices. It provides graphical representations as well as detailed information on monitored devices.
Design of the solution
Our Smart Cloud Scheduler (SCS) communicates with CloudStack and Zabbix Monitoring system through the APIs they have provided. SCS itself provides a REST API and a Web for the users using which they can send Resource Allocation Requests to the cloud. User first has to compose his Resource Allocation Request in a structured format (will be discussed later) and send it to SCS. SCS then queries Zabbix Resource Monitor to fetch latest information on resource utilization and availability on the Cloud. Based on resource availability, SCS takes the decision on how to allocate the request and on which host should the request be allocated. Once these decisions are taken, they are executed by SCS on CloudStack using CloudStack API.
Overview of Smart Cloud Scheduler
Implementation of Smart Cloud Scheduler
Following diagram represents the high-level architecture of our system. This architecture diagram clearly shows how our resource scheduler works in the middle of Zabbix Resource Monitoring system and CloudStack IaaS framework and performs the coordination among them. A user can submit a query to the scheduler through Web Frontend or using the API endpoint provided by the scheduler. When the request is validated, authenticated and authorized, authentication service forwards an authorized and prioritized request to the core scheduler.
High-Level Architecture of the system
Smart Cloud Scheduler is a Node.js implementation which uses a MongoDB database for storage. We are using Mongoose as the Node.js driver for MongoDB and ‘csclient’ Node.js module for CloudStack API calls. Authentication service also provides authorization service for Smart Cloud Scheduler. It also uses MongoDB to store user account information. Core scheduler is the main component in the above diagram which encapsulates the core functionality of the resource scheduler.
Component-based architecture of the system
Following diagram illustrates the internal component-based architecture of the Core Scheduler.
A user can issue a resource allocation request using either web frontend or the REST API. The request sent by the user is authenticated and authorized and tagged according to the user’s priority and asking priority for the job. This authorized and prioritized request is then sent to the Host Filter which fetches latest resource information from Zabbix in order to determine which hosts to be chosen to allocate the incoming request. If there are hosts on which this request can be directly allocated, then the request is passed to VM scheduler and allocation is happened on the hosts selected by the Host Filter.
If there are no hosts to allocate the current request, it is then sent to Priority Scheduler. Priority Scheduler then sends this request to Migration Scheduler. Migration Scheduler tries to obtain space for the request on any of the hosts on the cloud by re-shuffling VMs on the cloud. If it can find space to allocate the request, it returns Priority Scheduler the hosts on which it generated space. Priority Scheduler then passes this information to VM scheduler and VM allocation is performed by VM scheduler on the selected hosts.
If migration scheduling is not possible, Priority Scheduler then forwards the request to the Preemptive Scheduler. Preemptive Scheduler checks the priority of the incoming request, then checks whether there are previous allocations on the cloud with lower priority than the incoming request. If there are low priority allocation, it checks whether enough resources for the incoming request can be released by preempting some of those VMs. If it is possible, Preemptive Scheduler performs preemption of those VMs and returns the hosts on which the resources have been released. This host is then used to allocate the new request.
In the next post about Smart Cloud Scheduler, I will describe each component of the SCS and will discuss current implementation status, limitations, and future improvements.