Run-time load adjustments

This page documents PGL features for dynamically adjusting test behavior based on runtime measurements. This functionality is supported starting with Polygraph version 2.8.0.

Table of Contents

1. For the impatient
2. Introduction
3. The Idea: Watchdogs
    3.1 Current State
    3.2 Sampling goal
    3.3 Actions
    3.4 Every statement: putting all things together
4. Example

1. For the impatient

Goal sample = { duration = 4sec; };

Phase phase = {
    script = {
        every sample do {
            StatSample sample = currentSample();
            time t = sample.real.miss.rptm.mean;
            if (t > 100msec) then {
                print("miss response time too large at ", t);
                changeLoadFactorBy(-30%);
            }
        }
        ...
    };
    ...
};
...

2. Introduction

Many performance workloads are designed to verify that the device under test (DUT) can sustain a given level of load. Such design simplifies workload creation and test result analysis, but requires a priory knowledge of device's sustained peak performance level. One way to find that level is to repeat tests, increasing the load with each next test. This approach is very simple but requires a lot of time. Binary search optimization can help, but may still require days of testing.

Ideally, we want the benchmark to find the peak and then test that the peak can be sustained. The search should be done without a priory knowledge of the device abilities. In practice, it is possible to approach this ideal using Polygraph features described here.

Testing scenarios other than finding sustained peak performance can also benefit from the features below. For example, it is often useful to see how the box behaves under overload conditions and [D]DoS attacks. Normally, a single test simulates a single attack, and many tests are needed to build a comprehensive picture. A better approach, available to Polygraph users, is to create a workload that simulates several attacks, configuring Polygraph to "back off" and give the device under test a break when it is clear that the device became unusable.

This page describes a set of features related to DUT Watchdogs supported by Polygraph. Using watchdogs, one can implement peak finder, series of DoS attack, and many other useful workloads.

3. The Idea: Watchdogs

Our first attempt to implement a peak finder feature was based on the Rptmstat (response time [thermo]stat) approach: the user would configure desirable response time range and let Polygraph to increase load when response time is below the range and decrease load when response time gets too high. Rptmstat worked in some environments but not others. It turned out, that having just a couple of knobs is insufficient to accommodate behavior of many real-world devices Polygraph has to test. For example, if the device under test does not slow down gradually under load, rptmstat was not able to detect overload conditions fast enough or was not able to decrease the load fast enough. Thermostats work well in rooms with gradual changes in temperature, but are not appropriate for environments that may require rapid, varying, and complex actions (e.g., nuclear reactor core).

The Watchdog approach described here is an attempt to allow the user (i.e., workload writer) to specify when and how the offered load (or other run-time factors) should change. This is very different from our initial rptmstat approach where a rigid algorithm was hard-coded into Polygraph, exposing just a couple of control knobs. We want the user to be able to say something along these non-PGL lines:

run the following script every few transactions:
    - if device under test is happy, then increase the load
    - if something went wrong, then pause the test for 5 minutes

Watchdog feature allows Polygraph to constantly monitor current conditions and act when those conditions meet pre-defined criteria. The sections below describe how conditions can be monitored and what actions can be taken.

3.1 Current State

In all use cases discussed so far, Polygraph should act based on various performance measurements. Virtually all performance measurements reported by Polygraph (run-time and post-moterm) are accessible. Note that most measurements can only be defined for a sample of test transactions. A sample can be explicitly defined (see sampling technique described in the next section), or phase statistics collected so far during the current phase can be used. Both methods yield a StatSample object.

Measurements are accessible via StatSample object fields, described elsewhere. Here is a simple example:

StatSample sample = currentSample();
if (sample.real.miss.rptm.mean > 100msec) then {
    ... // do something
} else {
    ... // do something else
}

Note that StatSample fields are measurements (facts), not knobs (variables). One cannot change their values.

3.2 Sampling goal

All measurement categories mentioned above require some sort of aggregation of information. For example, one has to observe several transactions to make accurate estimations/measurements of the response rate. Similarly, error count only means something if the observation interval has been specified.

To specify the sampling duration, an object of type Goal is used:

Goal smallSample = { duration = 3sec; };
Goal bigSample   = { xactions = 10000; };
Goal halfwaySample   = { duration = somePhase.goal.duration/2; };

Usually, the longer it takes Polygraph to satisfy the goal, the more accurate the collected measurements are. On the other hand, long sampling intervals prevent Polygraph from reacting to sudden changes in behavior. Fortunately, different watchdogs may have different sampling goals (see below).

As usual, goal settings are ORed together. For example, the following code describes a watchdog goal that will be satisfied when either 30 seconds of data is collected or 100 successful transactions finish their execution:

Goal goal = {
    duration = 30sec;
    xactions = 100;
};

3.3 Actions

A watchdog specification includes arbitrary PGL code. Polygraph does not interpret that code until the sampling goal is reached, run-time. Moreover, Polygraph interprets that code every time the sampling goal is reached.

Certain PGL calls are especially useful in a watchdog context:

3.4 Every statement: putting all things together

A watchdog object is built from the above components using an "every" PGL statement:

every Goal do Code;

Watchdogs are specified on a per-phase basis, using Phase's "script" field. Here is an example of two simple watchdog objects attached to a phase configuration:

Goal smallSample = { duration = 3sec; };
Goal bigSample   = { xactions = 10000; };

Phase phase = {
    name = "peak_finder";
    goal.duration = 1hour;

    script = {
        every smallSample do {
            time t = currentSample().real.miss.rptm.mean;
            if (t > 100msec) then {
                print("miss response time too large at ", t);
                changeLoadFactorBy(-30%);
            }
        }

        every bigSample do {
            time t = currentSample().real.hit.rptm.mean;
            if (t < 50sec) then {
                print("hit response time too small at ", t);
                changeLoadFactorBy(+10%);
            } else {
                print("hit response time is OK at ", t);
            }
        }
    };
    ...
};

It is very important to understand how watchdogs are interpreted. For a single watchdog, Polygraph waits until the Goal guard of the corresponding every statement is satisfied. Polygraph also accumulates statistics necessary to implement currentSample() and currentPhase() PGL function calls. Once the guard goal is satisfied, the corresponding do code of the every-statement is interpreted (i.e., executed). After the execution, the sample statistics is reset, and the cycle starts from scratch. In other words, sample statistics is collected and the do-code is interpreted every Goal-controlled non-overlapping interval. No sliding windows are used for sample statistics collection. Phase statistics is not reset during a phase and is constantly updated during the lifetime of the phase.

Each watchdog is treated in isolation from other watchdogs. It is possible for two watchdogs to reach their guard goals and fire their code at about the same time. To resolve conflicts, if needed, make sure that all if-statements guarding test-changing calls describe disjoint, non-overlapping conditions. Often, however, concurrent execution is not a problem.

4. Example

A simple workload using watchdog features is available elsewhere). Below is the interesting part of the console output of the corresponding Polygraph test.

000.10| i-finder   2557 511.40      1   0.00   0    1
000.18| i-finder   5132 514.99      2   0.00   0    3
000.23| script output: increasing load factor by 10%
000.23| fyi: changing load factor level from 100.00% to 110.00%
000.27| i-finder   7724 518.40      2   0.00   0    0
000.35| i-finder  10556 566.39      2   0.00   0    0
000.43| i-finder  13473 583.40      2   0.00   0    0
000.45| script output: increasing load factor by 10%
000.45| fyi: changing load factor level from 110.00% to 121.00%
000.52| i-finder  16527 610.80      4   0.00   0    0
000.60| i-finder  19570 608.57      4   0.00   0    6
000.67| script output: increasing load factor by 10%
000.67| fyi: changing load factor level from 121.00% to 133.10%
000.68| i-finder  22708 627.47      4   0.00   0    1
000.77| i-finder  26025 663.38      4   0.00   0    0
000.85| i-finder  29406 676.09      3   0.00   0    4
000.88| script output: increasing load factor by 10%
000.88| fyi: changing load factor level from 133.10% to 146.41%
000.93| i-finder  32944 707.25      5   0.00   0    4
001.02| i-finder  36629 736.96      4   0.00   0    4
001.10| script output: increasing load factor by 10%
001.10| fyi: changing load factor level from 146.41% to 161.05%
001.10| i-finder  40113 696.77      3   0.00   0    2
001.18| i-finder  44112 799.80      4   0.00   0    0
001.27| i-finder  48083 794.05      5   0.00   0    1
001.32| script output: increasing load factor by 10%
001.32| fyi: changing load factor level from 161.05% to 177.16%
001.35| i-finder  52242 831.20      5   0.00   0   11
001.43| i-finder  56667 884.39      7   0.00   0    5
001.52| i-finder  61129 891.84      8   0.00   0   10
001.53| script output: increasing load factor by 10%
001.53| fyi: changing load factor level from 177.16% to 194.87%
001.60| i-finder  65955 965.07     15   0.00   0    3
001.68| i-finder  70861 981.18     13   0.00   0    3
001.75| script output: increasing load factor by 10%
001.75| fyi: changing load factor level from 194.87% to 214.36%
001.77| i-finder  75726 972.75     14   0.00   0   14
001.85| i-finder  80972 1049.20     23   0.00   0   31
001.93| i-finder  86389 1083.35     44   0.00   0   55
001.97| script output: increasing load factor by 10%
001.97| fyi: changing load factor level from 214.36% to 235.79%
002.02| script output: decreasing load factor by 30%
002.02| fyi: changing load factor level from 235.79% to 165.06%
002.02| i-finder  91471 1015.15    158   0.00   0  691
002.08| script output: decreasing load factor by 30%
002.08| fyi: changing load factor level from 165.06% to 115.54%
002.10| i-finder  96108 927.40     85   0.00   0    0
002.18| i-finder  99028 584.00      3   0.00   0    0
002.27| i-finder 101909 576.16      2   0.00   0    1
002.35| i-finder 104902 598.60      2   0.00   0    0
002.40| script output: increasing load factor by 10%
002.40| fyi: changing load factor level from 115.54% to 127.09%
002.43| i-finder 107908 601.20      2   0.00   0    0
002.52| i-finder 111152 648.72      3   0.00   0    2
002.60| i-finder 114317 632.97      2   0.00   0    1
002.62| script output: increasing load factor by 10%
002.62| fyi: changing load factor level from 127.09% to 139.80%
002.68| i-finder 117684 673.37      3   0.00   0    1
002.77| i-finder 121169 696.90      3   0.00   0    2
002.83| script output: increasing load factor by 10%
002.83| fyi: changing load factor level from 139.80% to 153.78%
002.85| i-finder 124737 713.52      4   0.00   0    1
002.93| i-finder 128629 778.13      4   0.00   0    2
003.02| i-finder 132363 746.80      5   0.00   0    0
003.05| script output: increasing load factor by 10%
003.05| fyi: changing load factor level from 153.78% to 169.16%
003.10| i-finder 136512 829.72      6   0.00   0    2
003.18| i-finder 140702 837.50      5   0.00   0    9
003.27| script output: increasing load factor by 10%
003.27| fyi: changing load factor level from 169.16% to 186.08%
003.27| i-finder 144884 836.37      8   0.00   0   22
003.35| i-finder 149497 922.55     11   0.00   0   12
003.43| i-finder 154202 940.98      9   0.00   0   19
003.48| script output: increasing load factor by 10%
003.48| fyi: changing load factor level from 186.08% to 204.69%
003.52| i-finder 159051 968.61     14   0.00   0   11
003.60| i-finder 164144 1018.49     22   0.00   0    2
003.68| i-finder 169202 1011.38     31   0.00   0   11
003.70| script output: increasing load factor by 10%
003.70| fyi: changing load factor level from 204.69% to 225.15%
003.75| script output: decreasing load factor by 30%
003.75| fyi: changing load factor level from 225.15% to 157.61%
003.77| i-finder 174405 1040.60    102   0.00   0    2
003.82| script output: decreasing load factor by 30%
003.82| fyi: changing load factor level from 157.61% to 110.33%
003.85| i-finder 177877 694.40      4   0.00   0    0
003.93| i-finder 180561 536.71      2   0.00   0    9
004.02| i-finder 183285 544.80      2   0.00   0    0
004.10| i-finder 186043 551.49      2   0.00   0    4
004.13| script output: increasing load factor by 10%
004.13| fyi: changing load factor level from 110.33% to 121.36%
004.18| i-finder 189094 610.16      2   0.00   0    3
004.27| i-finder 192114 603.98      2   0.00   0    5
004.35| script output: increasing load factor by 10%

As you can see, Polygraph starts with 500 requests per second rate and the load factor is increased in several steps because response time is below dutGood.rptm_max or 50 milliseconds. About 2 minutes into the test, the response time climbs above that threshold and the second watchdog decreases load factor by 30%. Four seconds after, the load factor is decreased by 30% again. After that, the response time comes back to normal levels and Polygraph starts increasing the load factor. This pattern repeats throughout the test.

Request rate analysis shows that a safe peak rate is somewhere around 950-1050 requests per second. The absolute values are not important though, we are just illustrating the technique with a very simple no-proxy test.

This example is closer to a repeated DoS simulation than a Sustained Peak Finder workload. A good Peak Finder workload should be adjusted to try to sustain high request rate just below the breaking point. It is quote possible that the device under test can survive short peaks of 900 req/sec load, but cannot sustain 700 req/sec load for more than 5 minutes. Such a workload improvement is possible by writing more complex watchdogs.