An autoscalable approach to optimize energy consumption using smart meters data in serverless computing

Jasmine Kaur; Inderveer Chana; Anju Bala

doi:10.2516/stet/2024078

Home

All issues

Volume 79 (2024)

Sci. Tech. Energ. Transition, 79 (2024) 83

Full HTML

Open Access

Issue		Sci. Tech. Energ. Transition Volume 79, 2024


Article Number		83
Number of page(s)		12
DOI		https://doi.org/10.2516/stet/2024078
Published online		15 October 2024

Science and Technology for Energy Transition 79, 83 (2024)

Regular Article

An autoscalable approach to optimize energy consumption using smart meters data in serverless computing

Jasmine Kaur^*, Inderveer Chana and Anju Bala

Computer Science and Engineering Department, Thapar Institute of Engineering and Technology, Patiala 147004, Punjab, India

^* Corresponding author: jkaur_phd20@thapar.edu

Received: 9 April 2024
Accepted: 2 September 2024

Abstract

Serverless computing has evolved as a prominent paradigm within cloud computing, providing on-demand resource provisioning and capabilities crucial to Science and Technology for Energy Transition (STET) applications. Despite the efficiency of the auto-scalable approaches in optimizing performance and cost in distributed systems, their potential remains underutilized in serverless computing due to the lack of comprehensive approaches. So an auto-scalable approach has been designed using Q-learning, which enables optimal resource scaling decisions. This approach proves useful for adjusting resources dynamically to maximize resource utilization by automatically scaling up or down resources as needed. Further, the proposed approach has been validated using AWS Lambda with key performance metrics such as probability of cold start, average response time, idle instance count, energy consumption, etc. The experimental results demonstrate that the proposed approach performs better than the existing approach by considering the above parameters. Finally, the proposed approach has also been validated to optimize the energy consumption of smart meter data.

Key words: Serverless computing / Autoscaling / Q-learning / Performance / Energy consumption

© The Author(s), published by EDP Sciences, 2024

This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

1 Introduction

Serverless computing is developing as an emerging paradigm in cloud computing designed to make it easier for cloud service providers to use the cloud by handling all the management tasks and optimizing resource utilization, resulting in cost savings and energy efficiency. The main characteristic of the serverless computing approach is dynamic scaling and thus serverless instances have faster startup times as compared to VM-based instances but still show low and unpredictable performance metrics [1]. Serverless computing services are not adaptive to the workloads and use the same management policies for all executed functions in parallel and distributed environments. Adapting the platform to varied workloads has the potential to greatly improve infrastructure cost, performance, and energy consumption [2].

By using the proposed approach, serverless providers can create auto-scalable and predictive platforms, improving the quality of service (QoS) and reducing wasted computing resources. Application developers would also benefit from such auto-scalable approaches by achieving the required quality of service that enables them to migrate more workloads into serverless computing platforms.

1.1 Motivation

The research motivation for this paper is outlined as follows:

The motivation behind this work lies in the fact that there is a need to develop an autoscaling mechanism for serverless applications that are implemented by the Q-learning technique [1, 3, 4]. This is because serverless computing offerings are not adaptive to the workload and use the same management policies for distributed computing applications [2, 5–7].
Our motivation for this research is driven by the need to optimize energy usage, particularly focusing on Science and Technology for Energy Transition (STET) applications [8, 9].
Based on recent studies [10–12], a lack of evaluation of various performance metrics has been identified, indicating a need to compare the results of the proposed approach with the existing approach based on these evaluation metrics. Additionally, the absence of validation of an auto-scalable model on serverless computing platforms further emphasizes the significance of our research.

1.2 Our contribution

The main contribution of this paper is to propose an auto-scalable approach to enhance performance and optimize energy consumption in serverless computing.

The significant contribution of this paper is to propose an auto-scalable approach using Q-learning to optimize resource allocation in serverless computing environments. The approach addresses the dynamic nature of workloads by allocating instances to incoming requests and when there are no available instances in the pool, it intelligently adds new function instances to meet the requirement. It also incorporates a mechanism to scale down resources if the demand is less than the available resources, ensuring efficient resource utilization and adaptability to varying workloads.
One distinguishing aspect of this approach is to optimize electricity consumption in real-world applications, specifically focusing on smart meters for residential buildings. This approach addresses the critical need to optimize energy usage, leading to improved energy efficiency and cost savings in serverless computing environments.
To ensure the applicability, the proposed approach has been verified on AWS Lambda and assessed using various performance parameters such as probability of cold start, average response time, average number of function instances, energy consumption, and utilization. Additionally, a comparative analysis has been conducted against the base approach to provide a comprehensive outlook on the effectiveness of the approach.

The remainder of the paper is structured as follows: Key research studies are highlighted in Section 2. Section 3 discusses the preliminaries employed in the proposed approach. Section 4 outlines the details of the proposed Q-learning approach. Section 5 demonstrates the experimental validation of the proposed approach. Finally, Section 6 concludes the paper and presents future research directions.

2 Related work

Serverless computing has gained a lot of attention from researchers but no auto-scalable approach has been proposed that enhances performance and captures various aspects and challenges in serverless computing. Jawaddi et al. [13], Mahmoudi et al. [10], and Mahmoudi et al. [11] presented queuing theory in recent years to address autoscaling in serverless computing. To dynamically manage the required number of containers, Suresh et al. [14] utilized M/M/k approaching assumptions alongside the square root staffing method. Scaling decisions are determined by assessing the arrival of incoming function requests and the current container count. Shankar et al. [15] introduced an approach for scaling resources in advance by analyzing the number and size of tasks and scaling the number of workers periodically with a predefined factor to meet the task workload. Mahmoudi et al. [16] investigated Markov chain approaches for modeling queueing systems in serverless computing environments i.e. Continuous-Time Markov Chain (CTMC) and Discrete-Time Markov Chain (DTMC). Zhao et al. [17] investigated alternative approaches, such as the simple moving average (SMA) and the exponential moving average (EMA).

Wen et al. [18] presented an in-depth investigation into the challenges faced by developers in building serverless-bassed applications. Perez et al. [19] introduced an open-source framework showing the elasticity and resource efficiency of the framework under varying workloads. Based on the results presented by Kim et al. [20], it is observed that the visibility and predictability of network and disk I/O performance must be made mandatory as in the case of CPU and memory. Enes et al. [21] introduced an innovative platform for dynamic scaling of container resources in real-time demonstrated through the evaluation of big data workloads. The platform showcases increased CPU utilization with less execution time overhead. This scalability is validated using a 32-container cluster, challenging initial perceptions of serverless suitability for Big Data applications. Jackson et al. [22] examined how the choice of language runtime affects both the performance and cost of serverless function execution. The paper introduces a novel serverless performance testing framework, evaluating metrics for AWS Lambda and Azure Functions. The findings show that Python is the optimal choice on AWS Lambda for achieving optimal performance and cost efficiency in serverless applications. Shafiei et al. [23], and Singh et al. [24] proposed energy-aware scheduling to reduce energy consumption. The main purpose of this type of scheduling is to put the execution environment or inactive containers in a cold-state mode.

Table 1 evaluated the work related to performance metrics i.e. cost, scalability, cold start, energy consumption, resource utilization, and response time in serverless computing. As per our literature review, some authors have considered specific metrics in their studies. Wen et al. [18] evaluated cost, cold start, and resource utilization. Perez et al. [19] considered scalability and resource utilization. Kim et al. [20] examined cost. However, as per our knowledge, no author has comprehensively addressed all performance metrics simultaneously. This gap in the current literature highlights the need for further research to evaluate the proposed approach across all relevant metrics.

Table 1

Evaluation of performance metrics in serverless computing.

Table 2

Symbols and their corresponding description.

Table 3

Workload analysis for cost and energy efficiency in serverless computing environments.

Table 4

Comparative analysis: proposed approach’s improvement percentage.

3 Preliminaries

To develop a comprehensive auto-scalable approach for serverless computing platforms, firstly there is a need to understand the functioning and management of function instances in which the computations occur. In serverless computing platforms, each request is managed by a function instance which acts as a tiny server.

3.1 Function instance states

Recent research [10–12] indicated that function instances undergo six distinct states: initializing, cold-start, warm-start, running, idle, and expired as shown in Figure 1.

Fig. 1

State diagram of function instance.

When a new request arrives for the first time, firstly it goes to the initializing state. The initializing state signifies the phase during which the infrastructure initializes new instances, including the setup of virtual machines or containers to accommodate increased workload. Instances remain initializing until they become capable of handling incoming requests. The serverless provider does not charge during the initializing state.

Based on recent research studies [38, 39, 47, 48], a request that requires initialization steps because of inadequate provisioned capacity is referred to as a cold start. This process includes deploying a new function, initiating a new virtual machine, or a new function instance on an existing virtual machine, which affects the response time that is experienced by the users. Extensive research has been conducted to mitigate cold start in serverless computing [38, 49]. In the warm start, when a new request comes and the platform has an idle instance instead of spinning up a new function instance it will reuse the existing one [10].

Upon receiving a request, an instance transitions into the running state, which processes the request until a response is dispatched to the client. The duration an instance spends in the running state is subject to billing by the serverless provider. After the completion of the request, an instance enters into the idle state. During this period, the serverless platform maintains instances in a warm state for a certain duration to address potential future spikes in workload and developers are not billed for idle instances. If the warm instance in the idle state is not used for some time (expiration threshold), it is automatically shut down and goes to the expired state.

After exploring function instance states, it becomes necessary to explore autoscaling patterns as they offer strategies for dynamically adjusting resources based on the lifecycle of function instances.

3.2 Autoscaling patterns

Three autoscaling patterns are generally seen in the most widely used serverless computing platforms: scale-per-request scaling, concurrency value scaling, and metrics-based scaling.

In scale-per-request autoscaling, no queuing is involved and it follows synchronous scaling as the new request will be served by one of the idle and available instances which is called a warm start. Otherwise, the platform will instantiate a new instance for that specific request, a process referred to as a cold start. AWS Lambda, Apache OpenWhisk, Google Cloud Functions, IBM Cloud Functions, and Azure Function use this pattern [10–12, 50, 51]. Concurrency value autoscaling pattern has a shared queue and follows asynchronous scaling in which each function instance can receive multiple requests at the same time. In this scenario, the user defines a maximum limit for the number of requests allowed to enter the instance concurrently. Once this threshold is reached, any new incoming request triggers a cold start, leading to the instantiation of a new function instance. Google Cloud Run, Knative uses this pattern [4, 52]. In metrics-based autoscaling, the system tries to keep metrics such as memory, CPU usage, latency, or throughput within a predefined range. OpenFaaS, AWS Fargate, Kubeless, Azure Container Instances, and Fission use this pattern [16].

3.3 Maximum concurrency level

After exploring autoscaling patterns, it is essential to consider the concept of maximum concurrency level, which defines the maximum number of function instances that can be concurrently running. Upon reaching the maximum concurrency level [11], a new request will result in an error status indicating that the server cannot fulfill the request at that moment. This concept underscores the significance of the efficient request routing mechanism that is explained in the next subsection.

3.4 Request routing

Requests are first sent to recently formed instances to facilitate scaling in. If recently created instances are busy only then the request will use the older containers [7].

4 The proposed auto-scalable approach

An auto-scalable approach utilizing Q-learning has been introduced in this research, aimed at dynamically adjusting resource allocation in response to real-time demand. This adaptive approach significantly enhances performance and optimizes energy consumption in serverless computing environments. Q-learning, a widely recognized reinforcement learning algorithm in machine learning and artificial intelligence [1], has been employed to empower agents to make decisions within an environment, aiming to maximize cumulative rewards over time [3]. This learning algorithm is particularly effective in situations where the agent doesn’t have prior knowledge of the environment and must learn from trial and error [4].

The proposed approach has been demonstrated in Section 4.1. The proposed algorithm is explained in Section 4.2, followed by the detailed calculation of different parameters within the proposed approach in Section 4.3.

4.1 Proposed approach

In the proposed approach illustrated in Figure 2, the scalper-request auto-scaling strategy has been adopted to efficiently manage the allocation of instances based on incoming requests. This dynamic approach ensures that the system scales up or down in response to demand fluctuations, optimizing resource utilization.

Fig. 2

An overview of the proposed auto-scalable approach using Q-Learning.

The approach verifies the availability of instances when a request arrives and searches for an available instance. If instances are available, they have been allocated to fulfill the request. However, if a new request arrives and there are no available instances, this triggers a cold start scenario. To address this, a new function instance has been introduced to the warm pool, effectively scaling up the instances to meet the increased demand. On the other hand, if there has been no request for a specific duration, the approach implements a scale-in mechanism. Instances that have been idle for a predetermined amount of time have been considered expired and terminated, leading to a reduction in the number of active instances. The allocation of instances to requests has been governed by a minimum cost and average response time. This ensures a balanced approach that considers both cost efficiency and timely responsiveness. Recognizing the potential limitation of minimum cost allocation in achieving actual cost savings, the proposed approach has incorporated a Q-learning algorithm. Initially, the Q-learning algorithm will start with identifying the current state (S) of the serverless application including factors like response time, resource utilization, and request rate. Further, the Q-table has been designed to store Q-values for state-action pairs and specify the Q-learning parameters like the learning rate (α), and discount factor (γ). Then, the exploration strategies have been implemented to balance exploration (try a random action) and exploitation (choose the action with the highest Q-value). Next, an action (A) will be chosen based on scaling decisions i.e. adding or removing function instances in serverless, and measures reward (R) that could be based on maintaining performance and cost savings. During each episode, the agent selects an action, receives a reward, and updates the Q-value using the Bellman equation as given in equation (1): $Q (s, a) \leftarrow Q (s, a) + α (R (s, a) + γ \max_{a} Q (s', a') - Q (s, a))$ $Q(s,a)\leftarrow Q(s,a)+\alpha \left(R(s,a)+\gamma \underset{a}{\mathrm{max}}Q({s\prime},{a\prime})-Q(s,a)\right)$ (1)where Q(s, a) is the Q-value for the (state,action) pair, α is the learning rate, R(s,a) is the reward for the current allocation, γ is the discount factor, and $\max_{a} Q (s', a')$ ${\mathrm{max}}_aQ({s\prime},{a\prime})$ is the maximum Q-value for the next state.

By guiding the allocation based on q-values associated with maximum reward values, the approach aims to enhance its overall performance, achieving cost savings and improved efficiency in resource allocation.

4.2 Proposed algorithm

In this section, the algorithm has been proposed to address resource scheduling and evaluate performance parameters.

Algorithm 1 initializes the data structures necessary for the subsequent algorithms. It sets up the Q matrix to store (σ, ν) pairs and creates lists of σ and ν with their respective attributes. It adjusts the demands δ_c, δ_r, and δ_b based on a given parameter θ and takes a list of σ with attributes as input and outputs the adjusted demands for each δ_i in each σ_i.

Algorithm 1Initialization and load adjustment

1: Initialize Q for (σ,ν) pairs.

2: Create lists of σ and ν with attributes.

3: for each σ_i in σdo

4: for each δ_i in σ_ido

5: Adjust δ_c, δ_r, δ_b demands based on θ.

6: end for

7: end for

Algorithm 2 assigns resources $ν_{c}$ ${\nu }_c$ to tasks $δ_{n}$ ${\delta }_n$ based on performance parameters. It iterates over each $σ_{i}$ ${\sigma }_i$ in the list of $σ$ $\sigma$ with attributes, creates a new task $δ_{n}$ ${\delta }_n$ , and then either explore or exploits the available resources $ν_{c}$ ${\nu }_c$ for that task. Finally, it assigns the chosen resources to the task if they meet certain conditions and calculates $R (x)$ $ R(x)$ for the task.

Algorithm 2Job allocation

1: Input: List of σ with attributes.

2: Output: Assigned ν_c to δ_n, collected data based on performance parameters.

3: for each $σ_{i}$ ${\sigma }_i$ in $σ$ $\sigma$ do

4: Create $δ_{n}$ ${\delta }_n$ for $σ_{i}$ ${\sigma }_i$ .

5: if $r_{u} (0,1) <$ ${r}_u(\mathrm{0,1})<$ ϵ then ▷ Exploration

6: $ν_{a} \leftarrow ν_{c}$ ${\nu }_a\leftarrow {\nu }_c$ for $δ_{n}$ ${\delta }_n$ .

7: if $ν_{a} \neq 0$ ${\nu }_a\ne 0$ then

8: $ν_{c} \underset{\leftarrow}{rand} ν_{a}$ ${\nu }_c\underleftarrow{{rand}}{\nu }_a$ .

9: end if

10: else ▷ Exploitation

11: $ν_{a} \leftarrow ν_{c}$ ${\nu }_a\leftarrow {\nu }_c$ for $δ_{n}$ ${\delta }_n$ .

12: if $ν_{a} \neq 0$ ${\nu }_a\ne 0$ then

13: $ν_{c} \underset{\leftarrow}{argmax} ν_{a}$ ${\nu }_c\underleftarrow{{argmax}}{\nu }_a$ .

14: end if

15: end if

16: if $ν_{c} \neq 0$ ${\nu }_c\ne 0$ and $ν_{c} \subseteq δ_{n}$ ${\nu }_c\subseteq {\delta }_n$ then

17: Assign $ν_{c}$ ${\nu }_c$ to $δ_{n}$ ${\delta }_n$ .

18: Calculate $R (x)$ $ R(x)$ for $δ_{n}$ ${\delta }_n$ .

19: Call equation (1).

20: end if

21: end for

Algorithm 3 collects data based on performance parameters. It takes performance parameters as input and collects data based on them. It iterates over each $θ$ $\theta$ in a set $θ_{s}$ ${\theta }_s$ , initializes $σ$ $\sigma$ and $ν$ $\nu$ , executes the Load Adjustment and Job Allocation algorithms, and collects results for each iteration in total iterations.

Algorithm 3Result collection

1: Input: Performance parameters.

2: Output: Collected data based on performance parameters.

3: for each $θ$ $\theta$ in $θ_{s}$ ${\theta }_s$ do

4: Initialize $σ$ $\sigma$ and $ν$ $\nu$ .

5: Execute Algorithm 1.

6: for each iteration in totalIterations do

7: Execute Algorithm 2.

8: Perform Result Collection.

9: end for

10: end for

The output of the above algorithm is the data collected based on performance parameters that are further evaluated in the next section.

4.3 Evaluation metrics

In the following subsections, the calculation of different metrics in the proposed approach has been presented and the symbols used in the proposed approach are defined in Table 2.

4.3.1 Probability of cold start ( $P_{c}$ ${{P}}_{{c}}$ )

In equation (2) probability of a cold start can be calculated by dividing the number of requests causing a cold start by the total number of requests made during the experiment. $P_{c} = \frac{R_{c}}{R_{n}}$ ${P}_c=\frac{{R}_c}{{R}_n}$ (2)

4.3.2 Average response time ( $\overset{̅}{X}$ $\overline{{X}}$ )

Equation (3) calculates this metric by finding the average response time for completed jobs (n) across multiple users. This parameter takes a list of users as input and each user has a list of jobs. The response time is accumulated for computed jobs and the average response time is computed. $\overset{̅}{X} = \frac{\sum_{i = 1}^{n} {\overset{̅}{X}}_{i}}{n}$ $\bar{X}=\frac{\sum_{i=1}^n {\bar{X}}_i}{n}$ (3)

4.3.3 Mean number of warm pool instances ( $I_{w}$ ${{I}}_{{w}}$ )

The average number of instances in the warm pool can be calculated as shown in equation (4) based on the number of warm instances $(I_{w, n})$ $({I}_{w,n})$ and the total number of instances ( $I_{n}$ ${I}_n$ ) $I_{w} = \frac{I_{w, n}}{I_{n}}$ ${I}_w=\frac{{I}_{w,n}}{{I}_n}$ (4)

4.3.4 Mean number of idle instances ( $I_{idle}$ ${{I}}_{{idle}}$ )

This can be measured in equation (5) as the ratio of the number of instances in the cold state to the total number of instances. $I_{idle} = \frac{I_{c}}{I_{n}}$ ${I}_{{idle}}=\frac{{I}_c}{{I}_n}$ (5)

4.3.5 Mean number of running instances ( $I_{r}$ ${{I}}_{{r}}$ )

As shown in equation (6), the mean number of running instances can be calculated by subtracting the number of idle instances from the total number of instances. $I_{r} = I_{n} - I_{idle}$ ${I}_r={I}_n-{I}_{{idle}}$ (6)

4.3.6 Utilization ( $\overset{̅}{U}$ $\overline{{U}}$ )

This is defined as in equation (7) the ratio of instances in running state relative to all instances. $\overset{̅}{U} = \frac{I_{r}}{I_{n}}$ $\bar{U}=\frac{{I}_r}{{I}_n}$ (7)

5 Experimental results

To show the effectiveness, reliability, and adaptability of the proposed approach, the results outlined in this section are applied to AWS Lambda. The approach follows scale-per-request i.e. having no queue using AWS Lambda. The serverless computing offerings are not adaptive to the workload that is being executed on them but optimizing the expiration threshold ( $T_{\exp}$ ${T}_{{exp}}$ ), after which being idle causes the instance to be expired and terminated is the one way by which the serverless computing platform is adaptive to the executed workload. Figures 4–9 depict the effect of ( $T_{\exp}$ ${T}_{{exp}}$ ) on different performance parameters for different workloads as shown in Table 3. It can be seen, that the increase in $T_{\exp}$ ${T}_{{exp}}$ would improve the performance. However, as the average response time ( $\overset{̅}{X}$ $\bar{X}$ ) is the main parameter of performance by using this approach we try to decrease cost and energy consumption to the maximum.

Case 1: Probability of Cold Start $(P_{c})$ $({{P}}_{{c}})$

Figure 3 shows the probability of a cold start over the expiration threshold ( $T_{\exp}$ ${T}_{{exp}}$ ). When determining the quality of service, users mostly look at this measure. Reducing the probability of a cold star is critical for many applications as having a larger probability of a cold start could affect the experience of the user. The probability of cold start for workloads L1, L2, L3, and L4 are 3.11%, 3.00%, 2.77%, and 2.66% respectively.

Case 2: Average Response Time ( $\overset{̅}{X}$ $\overline{{X}}$ )

Fig. 3

Probability of cold start against the expiration threshold.

As can be seen in Figure 4, the proposed approach obtains an average response time of 30.25 ms for workload L1, 64.24 ms for workload L2, 37.72 ms for workload L3, and 25.42 ms for workload L4. The different workloads have different behavior when changing the expiration threshold ( $T_{\exp}$ ${T}_{{exp}}$ ).

Case 3: The Mean Number of Idle Instances ( $I_{idle}$ ${{I}}_{{idle}}$ )

Fig. 4

Average response time against the expiration threshold.

Figure 5 depicts the variation in the number of idle instances with the expiration threshold. As the expiration threshold increases, a noticeable decrease in the number of idle instances is observed, suggesting a potential optimization in resource utilization. The graph further illustrates that, under varying workloads indicated by the contrasting lines, the Q-learning approach consistently performs well. These findings underscore the importance of considering expiration thresholds in optimizing system performance and resource allocation, with implications for overall efficiency and cost-effectiveness. The mean number of idle instances during different workloads L1, L2, L3, and L4 are 28, 27, 25, and 24 respectively.

Case 4: Job Completion Rate ( $J_{c}$ ${{J}}_{{c}}$ )

Fig. 5

The number of idle instances against the expiration threshold.

Figure 6 shows the correlation between the expiration threshold and the job completion rate. As the expiration threshold increases, there is an observable trend of improvement in job completion rates, suggesting a positive impact on overall system efficiency. The rate of job completion for the proposed approach is 57.78%, 55.59%, 55.27%, and 53.28% for L1, L2, L3, and L4 respectively.

Case 5: Energy Consumption( $E_{c}$ ${{E}}_{{c}}$ )

Fig. 6

Job completion rate against the expiration threshold.

The proposed approach achieves reduced energy consumption because scaling down the resources effectively shuts down idle system components when they are not needed, thereby minimizing energy wastage. As shown in Figure 7, the expiration threshold increases and there is a noticeable trend of increasing energy consumption. This trend holds consistent across varying workloads, suggesting that optimizing expiration thresholds can contribute to improved sustainability in resource usage. The total energy consumption estimated by different workloads L1, L2, L3, and L4 is 0.0084 mJ, 0.0085 mJ, 0.0092 mJ and 0.0087 mJ respectively.

Case 6: Utilization ( $\overset{̅}{U}$ $\overline{{U}}$ )

Fig. 7

Energy consumption against the expiration threshold.

The utilization of the warm pool instances for various expiration threshold values is shown in Figure 8. The average ratio of the number of running instances over all instances in the warm pool is represented by utilization in this context. Reduced utilization leads to additional instances being created and maintained, which increases cost. Serverless platforms would maximize average utilization to reduce cost and the values for L1, L2, L3, and L4 are 3.33%,3.33%,2.85%, and 3.03% respectively.

Case 7: Cost ( $C$ )

Fig. 8

Utilization against the expiration threshold.

Figure 9 presents a normalized estimate cost as perceived by the user. The expiration threshold can be changed to examine variations in the behavior of the workload. For example, increasing the expiation threshold from 0 to 140 causes an increase in user cost. This trend holds consistent across varying workloads, suggesting that optimizing expiration thresholds will lead to a decrease in user cost. This shows the potential savings that can be implemented by the approach presented and evaluated in this paper.

Fig. 9

Normalized user cost against the expiration threshold.

5.1 Comparison of proposed approach with existing approach

The comparison of the proposed approach with the base approach [10] is done by taking various evaluation parameters such as $P_{c}$ ${P}_c$ , $\overset{̅}{X}$ $\bar{X}$ , $I_{idle}$ ${I}_{{idle}}$ , $\overset{̅}{U}$ $\bar{U}$ , $E_{c}$ ${E}_c$ . As can be seen in Figure 10, the proposed approach results better than as compared with the base approach.

Fig. 10

Comparative analysis of parameters between proposed and base approaches across various workloads. (a) Probability of cold start; (b) Average response time; (c) Idle instances; (d) Job completion rate; (e) Energy consumption; (f) Utilization.

Table 4 shows the improvement of different parameters obtained using the proposed approach and base approach. It is clear from Table 4 that the proposed approach improves P_c for different workloads by 38.79%, 26.11%, 35.17%, and 53.91% respectively than the base approach. The proposed approach also outperforms the base approach by increasing $\overset{̅}{X}$ $\bar{X}$ by 31.73%, 26.29%, 39.23%, and 46.24% for L1, L2, L3, and L4 respectively. The proposed approach shows improvement in $I_{idle}$ ${I}_{{idle}}$ by 3.20%, 3.23%, 3.81%, and 3.23% respectively for L1, L2, L3, and L4. The proposed approach improves $E_{c}$ ${E}_c$ for workloads L1 by 51.33%, L2 by 46.97%, L3 by 40.95%, and L4 by 47.35% respectively to the base approach.

5.2 Verification and validation: smart meters for residential buildings

The proposed approach has been verified and validated to optimize energy consumption by taking smart meter data for residential buildings [8, 9]. To optimize energy consumption in the smart meters dataset for residential buildings, the dataset serves as input to the Q-learning algorithm as shown in Figure 11. Each input to a dataset represents a state, encapsulating relevant information such as energy consumption patterns, weather conditions, and appliance usage. The agent utilizes state information to make decisions for optimizing energy consumption. By measuring rewards associated with different actions taken by the agent, the agent learns to select actions that lead to the greatest rewards over time. This approach aims to minimize energy waste, reduce costs for residents, and enhance overall energy efficiency in residential buildings.

Fig. 11

Smart meters for residential buildings using Q-learning.

As shown in Figure 12, the results illustrate that the proposed approach, utilizing real-time smart meter data, achieves lower energy consumption compared to actual usage by dynamic resource allocation and optimization. The case study involved monitoring energy consumption at different time intervals.The actual energy consumption values were 100 kWh, 120 kWh, 90 kWh, 110 kWh, 130 kWh, and 95 kWh, respectively. After implementing our proposed approach for resource scaling in serverless computing environments, the energy consumption improved to 95 kWh, 98 kWh, 84 kWh, 102 kWh, 115 kWh, and 87 kWh, respectively. Additionally, the implementation has led to improved energy efficiency, cost savings, and an average improvement of approximately 9.54% in energy consumption, showcasing the scalability and applicability of the approach.

Fig. 12

Actual and improved electricity consumption of residential buildings using proposed approach.

6 Conclusion and future scope

In this paper, an auto-scalable approach has been proposed and experimentally validated for analyzing performance in terms of scalability, cost, and optimizing energy consumption, with inspiration from Science and Technology for Energy Transition (STET) applications. The proposed approach outperforms the existing approach as it improves the average response time by 35.62%, the mean number of idle instances by 3.37%, and reduces the probability of cold start and energy consumption by 38.5% and 46.15% respectively.The case study on optimizing energy consumption using the proposed approach further illustrates its practical applicability and effectiveness in real-world scenarios. The proposed approach uses a scale-per-request autoscaling pattern to its importance in serverless computing platforms. The presented approach can be used to improve the quality of service by improving their management policy and making their operations predictive. The proposed approach enables developers to handle enormous changes in their workload by providing scalable computing.

The proposed approach could be enhanced using the following future work:

In the future, the developed approach can also be improved using other reinforcement learning algorithms such as Deep Q-Networks (DQNs) to handle more complex state spaces and improve scalability decisions.
To create effective auto-scaling solutions, additional factors like real-time monitoring, predictive approaching, and feedback control loops with Q-learning need to be considered in the future.
Also, the use of machine learning or deep learning algorithms in the proposed approach would be able to improve more in terms of performance and energy consumption in the future.

References

Zafeiropoulos A., Fotopoulou E., Filinis N., Papavassiliou S. (2022) Reinforcement learning-assisted autoscaling mechanisms for serverless computing platforms, Simul. Modell. Pract. Theory 116, 102461. [CrossRef] [Google Scholar]
Shahrad M., Fonseca R., Goiri I., Chaudhry G., Batum P., Cooke J., Laureano E., Tresness C., Russinovich M., Bianchini R. (2020) Serverless in the wild: characterizing and optimizing the serverless workload at a large cloud provider, in: 2020 USENIX annual technical conference (USENIX ATC 20), pp. 205–218. [Google Scholar]
Agarwal S., Rodriguez M.A., Buyya R. (2021) A reinforcement learning approach to reduce serverless function cold start frequency, in: 2021 IEEE/ACM 21st international symposium on cluster, cloud and internet computing (CCGrid), IEEE, pp. 797–803. [CrossRef] [Google Scholar]
Schuler L., Jamil S., Kühl N. (2021) Ai-based resource allocation: reinforcement learning for adaptive auto-scaling in serverless environments, in: 2021 IEEE/ACM 21st international symposium on cluster, cloud and internet computing (CCGrid), IEEE, pp. 804–811. [CrossRef] [Google Scholar]
Van Eyk E., Iosup A., Abad C.L., Grohmann J., Eismann S. (2018) A spec rg cloud group’s vision on the performance challenges of faas cloud architectures, in: Companion of the 2018 ACM/SPEC international conference on performance engineering, pp. 21–24. [CrossRef] [Google Scholar]
Wang L., Li M., Zhang Y., Ristenpart T., Swift M. (2018) Peeking behind the curtains of serverless platforms, in: 2018 USENIX annual technical conference (USENIX ATC 18), pp. 133–146. [Google Scholar]
McGrath G., Brenner P.R. (2017) Serverless computing: design, implementation, and performance, in: 2017 IEEE 37th international conference on distributed computing systems workshops (ICDCSW), IEEE, pp. 405–410. [Google Scholar]
Kaur S., Bala A., Parashar A. (2024) A multi-step electricity prediction model for residential buildings based on ensemble empirical mode decomposition technique, Sci. Technol. Energy Trans. 79, 7. [Google Scholar]
Aljicevic Z., Kasapovic S., Hivziefendic J., Kevric J., Mujkic S. (2023) Resource allocation model for cloud-fog-based smart grid, Sci. Technol. Energy Trans. 78, 28. [Google Scholar]
Mahmoudi N., Khazaei H. (2020) Performance modeling of serverless computing platforms, IEEE Trans. Cloud Comput. 10, 4, 2834–2847. [Google Scholar]
Mahmoudi N., Khazaei H. (2020) Temporal performance modelling of serverless computing platforms, in: Proceedings of the 2020 sixth international workshop on serverless computing, pp. 1–6. [Google Scholar]
Mahmoudi N., Khazaei H. (2021) Simfaas: a performance simulator for serverless computing platforms, arXiv preprint arXiv:2102.08904. [Google Scholar]
Jawaddi S.N.A., Ismail A. (2023) Autoscaling in serverless computing: taxonomy and openchallenges. [Google Scholar]
Suresh A., Somashekar G., Varadarajan A., Kakarla V.R., Upadhyay H., Gandhi A. (2020) Ensure: efficient scheduling and autonomous resource management in serverless environments, in: 2020 IEEE international conference on autonomic computing and self-organizing systems (ACSOS), IEEE, pp. 1–10. [Google Scholar]
Shankar V., Krauth K., Vodrahalli K., Pu Q., Recht B., Stoica I., Ragan-Kelley J., Jonas E., Venkataraman S. (2020) Serverless linear algebra, in: Proceedings of the 11th ACM symposium on cloud computing, pp. 281–295. [CrossRef] [Google Scholar]
Mahmoudi N., Khazaei H. (2022) Performance modeling of metric-based serverless computing platforms, IEEE Trans. Cloud Comput. 11, 2, 1899–1910. [Google Scholar]
Zhao Y., Uta A. (2022) Tiny autoscalers for tiny workloads: dynamic cpu allocation for serverless functions, in: 2022 22nd IEEE international symposium on cluster, cloud and internet computing (CCGrid), IEEE, pp. 170–179. [CrossRef] [Google Scholar]
Wen J., Chen Z., Liu Y., Lou Y., Ma Y., Huang G., Jin X., Liu X. (2021) An empirical study on challenges of application development in serverless computing, in: Proceedings of the 29th ACM joint meeting on European software engineering conference and symposium on the foundations of software engineering, pp. 416–428. [CrossRef] [Google Scholar]
Pérez A., Risco S., Naranjo D.M., Caballer M., Moltó G. (2019) On-premises serverless computing for event-driven data processing applications, in: 2019 IEEE 12th international conference on cloud computing (CLOUD), IEEE, pp. 414–421. [CrossRef] [Google Scholar]
Kim J., Lee K. (2020) I/o resource isolation of public cloud serverless function runtimes for data-intensive applications, Cluster Comput. 23, 2249–2259. [CrossRef] [Google Scholar]
Enes J., Expósito R.R., Touriño J. (2020) Real-time resource scaling platform for big data workloads on serverless environments, Future Gen. Comput. Syst. 105, 361–379. [CrossRef] [Google Scholar]
Jackson D., Clynch G. (2018) An investigation of the impact of language runtime on the performance and cost of serverless functions, in: 2018 IEEE/ACM international conference on utility and cloud computing companion (UCC companion), IEEE, pp. 154–160. [CrossRef] [Google Scholar]
Shafiei H., Khonsari A., Mousavi P. (2022) Serverless computing: a survey of opportunities, challenges, and applications, ACM Comput. Surv. 54, 11s, 1–32. [CrossRef] [Google Scholar]
Singh P., Kaur A., Gill S.S. (2022) Machine learning for cloud, fog, edge and serverless computing environments: comparisons, performance evaluation benchmark and future directions, Int. J. Grid Util. Comput. 13, 4, 447–457. [CrossRef] [Google Scholar]
Golec M., Ozturac R., Pooranian Z., Gill S.S., Buyya R. (2021) Ifaasbus: a security-and privacy-based lightweight framework for serverless computing using iot and machine learning, IEEE Trans. Industr. Inform. 18, 5, 3522–3529. [Google Scholar]
Grafberger A., Chadha M., Jindal A., Gu J., Gerndt M. (2021) Fedless: secure and scalable federated learning using serverless computing, in: 2021 IEEE international conference on big data (big data), IEEE, pp. 164–173. [CrossRef] [Google Scholar]
Li Z., Guo L., Cheng J., Chen Q., He B., Guo M. (2022) The serverless computing survey: a technical primer for design architecture, ACM Comput. Surv. (CSUR) 54, 10s, 1–34. [CrossRef] [Google Scholar]
Bebortta S., Das S.K., Kandpal M., Barik R.K., Dubey H. (2020) Geospatial serverless computing: architectures, tools and future directions, ISPRS Int. J. Geo-Inform. 9, 5, 311. [CrossRef] [Google Scholar]
Gill S.S. (2024) Quantum and blockchain based serverless edge computing: a vision, model, new trends and future directions, Internet Technol. Lett. 7, 1, e275. [CrossRef] [Google Scholar]
Mateus-Coelho N., Cruz-Cunha M. (2022) Serverless service architectures and security minimals, in: 2022 10th international symposium on digital forensics and security (ISDFS), IEEE, pp. 1–6. [Google Scholar]
Marin E., Perino D., Di Pietro R. (2022) Serverless computing: a security perspective, J. Cloud Comput. 11, 1, 1–12. [Google Scholar]
Yussupov V., Breitenbücher U., Leymann F., Wurster M. (2019) A systematic mapping study on engineering function-as-a-service platforms and tools, in: Proceedings of the 12th IEEE/ACM international conference on utility and cloud computing, pp. 229–240. [Google Scholar]
Van Eyk E., Toader L., Talluri S., Versluis L., Uță A., Iosup A. (2018) Serverless is more: from paas to present cloud computing, IEEE Internet Comput. 22, 5, 8–17. [CrossRef] [Google Scholar]
Cordingly R., Shu W., Lloyd W.J. (2020) Predicting performance and cost of serverless computing functions with saaf, in: 2020 IEEE Intl Conf on Dependable, Autonomic and Secure Computing, Intl Conf on Pervasive Intelligence and Computing, Intl Conf on Cloud and Big Data Computing, Intl Conf on Cyber Science and Technology Congress (DASC/PiCom/CBDCom/CyberSciTech), IEEE, pp. 640–649. [Google Scholar]
Bardsley D., Ryan L., Howard J. (2018) Serverless performance and optimization strategies, in: 2018 IEEE international conference on smart cloud (smart cloud), IEEE, pp. 19–26. [CrossRef] [Google Scholar]
Rajan R.A.P. (2018) Serverless architecture-a revolution in cloud computing, in: 2018 tenth international conference on advanced computing (ICoAC), IEEE, pp. 88–93. [CrossRef] [Google Scholar]
Grogan J., Mulready C., McDermott J., Urbanavicius M., Yilmaz M., Abgaz Y., McCarren A., MacMahon S.T., Garousi V., Elger P., et al. (2020) A multivocal literature review of function-as-a-service (faas) infrastructures and implications for software developers, in: Systems, software and services process improvement: 27th European conference, EuroSPI 2020, Düsseldorf, Germany, September 9–11, 2020, Proceedings 27, Springer, pp. 58–75. [Google Scholar]
Vahidinia P., Farahani B., Aliee F.S. (2022) Mitigating cold start problem in serverless computing: a reinforcement learning approach, IEEE Internet Things J. 10, 5, 3917–3927. [Google Scholar]
Liu X., Wen J., Chen Z., Li D., Chen J., Liu Y., Wang H., Jin X. (2023) Faaslight: general application-level cold-start latency optimization for function-as-a-service in serverless computing, ACM Trans. Software Eng. Methodol. 32, 5, 1–29. [Google Scholar]
Fuerst A., Sharma P. (2021) Faascache: keeping serverless computing alive with greedy-dual caching, in: Proceedings of the 26th ACM international conference on architectural support for programming languages and operating systems, pp. 386–400. [CrossRef] [Google Scholar]
Mampage A., Karunasekera S., Buyya R. (2021) Deadline-aware dynamic resource management in serverless computing environments, in: 2021 IEEE/ACM 21st international symposium on cluster, cloud and internet computing (CCGrid), IEEE, pp. 483–492. [CrossRef] [Google Scholar]
Kaur G., Bala A., Chana I. (2019) An intelligent regressive ensemble approach for predicting resource usage in cloud computing, J. Parallel Distrib. Comput. 123, 1–12. [CrossRef] [Google Scholar]
Datta S., Addya S.K., Ghosh S.K. (2024) Esma: towards elevating system happiness in a decentralized serverless edge computing framework, J. Parallel Distrib. Comput. 183, 104762. [CrossRef] [Google Scholar]
Naranjo D.M., Risco S., de Alfonso C., Pérez A., Blanquer I., Moltó G. (2020) Accelerated serverless computing based on gpu virtualization, J. Parallel Distrib. Comput. 139, 32–42. [CrossRef] [Google Scholar]
Sarroca P.G., Sánchez-Artigas M. (2024) Mlless: achieving cost efficiency in serverless machine learning training, J. Parallel Distrib. Comput. 183, 104764. [CrossRef] [Google Scholar]
Zuk P., Rzadca K. (2022) Reducing response latency of composite functions-as-a-service through scheduling, J. Parallel Distrib. Comput. 167, 18–30. [CrossRef] [Google Scholar]
Solaiman K., Adnan M.A. (2020) Wlec: a not so cold architecture to mitigate cold start problem in serverless computing, in: 2020 IEEE international conference on cloud engineering (IC2E), IEEE, pp. 144–153. [CrossRef] [Google Scholar]
Suo K., Shi Y., Xu X., Cheng D., Chen W. (2020) Tackling cold start in serverless computing with container runtime reusing, in: Proceedings of the workshop on network application integration/codesign, pp. 54–55. [CrossRef] [Google Scholar]
Bermbach D., Karakaya A.-S., Buchholz S. (2020) Using application knowledge to reduce cold starts in faas services, in: Proceedings of the 35th annual ACM symposium on applied computing, pp. 134–143. [CrossRef] [Google Scholar]
Jia Z., Witchel E. (2021) Nightcore: efficient and scalable serverless computing for latency-sensitive, interactive microservices, in: Proceedings of the 26th ACM international conference on architectural support for programming languages and operating systems, pp. 152–166. [CrossRef] [Google Scholar]
Mittal V., Qi S., Bhattacharya R., Lyu X., Li J., Kulkarni S.G., Li D., Hwang J., Ramakrishnan K., Wood T. (2021) Mu: an efficient, fair and responsive serverless framework for resource-constrained edge clouds, in: Proceedings of the ACM symposium on cloud computing, pp. 168–181. [CrossRef] [Google Scholar]
Lee H., Satyam K., Fox G. (2018) Evaluation of production serverless computing environments, in: 2018 IEEE 11th international conference on cloud computing (CLOUD), IEEE, pp. 442–450. [CrossRef] [Google Scholar]
Vojta R. (2016) Aws journey: Api gateway & lambda & vpc performance, Zrzka’s adventures. [Google Scholar]
Akkus I.E., Chen R., Rimac I., Stein M., Satzke K., Beck A., Aditya P., Hilt V. (2018) {SAND}: towards {High-Performance} serverless computing, in: 2018 USENIX annual technical conference (USENIX ATC 18), 2018, pp. 923–935. [Google Scholar]
Manner J., Endreß M., Heckel T., Wirtz G. (2018) Cold start influencing factors in function as a service, in: 2018 IEEE/ACM international conference on utility and cloud computing companion (UCC companion), IEEE, pp. 181–188. [CrossRef] [Google Scholar]

All Tables

Table 1

Evaluation of performance metrics in serverless computing.

In the text

Table 2

Symbols and their corresponding description.

In the text

Table 3

Workload analysis for cost and energy efficiency in serverless computing environments.

In the text

Table 4

Comparative analysis: proposed approach’s improvement percentage.

In the text

All Figures

	Fig. 1 State diagram of function instance.
In the text

	Fig. 2 An overview of the proposed auto-scalable approach using Q-Learning.
In the text

	Fig. 3 Probability of cold start against the expiration threshold.
In the text

	Fig. 4 Average response time against the expiration threshold.
In the text

	Fig. 5 The number of idle instances against the expiration threshold.
In the text

	Fig. 6 Job completion rate against the expiration threshold.
In the text

	Fig. 7 Energy consumption against the expiration threshold.
In the text

	Fig. 8 Utilization against the expiration threshold.
In the text

	Fig. 9 Normalized user cost against the expiration threshold.
In the text

	Fig. 10 Comparative analysis of parameters between proposed and base approaches across various workloads. (a) Probability of cold start; (b) Average response time; (c) Idle instances; (d) Job completion rate; (e) Energy consumption; (f) Utilization.
In the text

	Fig. 11 Smart meters for residential buildings using Q-learning.
In the text

	Fig. 12 Actual and improved electricity consumption of residential buildings using proposed approach.
In the text

Current usage metrics show cumulative count of Article Views (full-text article views including HTML views, PDF and ePub downloads, according to the available data) and Abstracts Views on Vision4Press platform.

Data correspond to usage on the plateform after 2015. The current usage metrics is available 48-96 hours after online publication and is updated daily on week days.

Initial download of the metrics may take a while.

An autoscalable approach to optimize energy consumption using smart meters data in serverless computing

1 Introduction

1.1 Motivation

1.2 Our contribution

2 Related work

3 Preliminaries

3.1 Function instance states

3.2 Autoscaling patterns

3.3 Maximum concurrency level

3.4 Request routing

4 The proposed auto-scalable approach

4.1 Proposed approach

4.2 Proposed algorithm

4.3 Evaluation metrics

4.3.1 Probability of cold start ( P c )

4.3.2 Average response time ( X ̅ )

4.3.3 Mean number of warm pool instances ( I w )

4.3.4 Mean number of idle instances ( I idle )

4.3.5 Mean number of running instances ( I r )

4.3.6 Utilization ( U ̅ )

5 Experimental results

5.1 Comparison of proposed approach with existing approach

5.2 Verification and validation: smart meters for residential buildings

6 Conclusion and future scope

References

All Tables

All Figures

4.3.1 Probability of cold start ( $P_{c}$ ${{P}}_{{c}}$ )

4.3.2 Average response time ( $\overset{̅}{X}$ $\overline{{X}}$ )

4.3.3 Mean number of warm pool instances ( $I_{w}$ ${{I}}_{{w}}$ )

4.3.4 Mean number of idle instances ( $I_{idle}$ ${{I}}_{{idle}}$ )

4.3.5 Mean number of running instances ( $I_{r}$ ${{I}}_{{r}}$ )

4.3.6 Utilization ( $\overset{̅}{U}$ $\overline{{U}}$ )