Object life cycle

This page explains the object life cycle model supported by Web Polygraph.

Table of Contents

1. Overview
2. Object Creation Time
3. Object Modification Times
4. Object Expiration Times
5. Handling of If-Modified-Since requests
6. Examples
    6.1 Monthly E-Zine Server
    6.2 News Server
    6.3 Static Objects

1. Overview

Object life cycle model is responsible for simulating object modifications, expirations, and similar events in Web object's ``life''. The model affects outcome of If-Modified-Since (IMS) requests as well as various prefetching or validation algorithms that depend on object freshness. The model is configured using ObjLifeCycle type.

Model components are described below.

2. Object Creation Time

Every Web object is assumed to be created some time in the past. In recent Polygraph distributions, the birthday is selected at random within a [time "zero", object life cycle length) interval. Time zero is usually the beginning of year 1970. Thus, all objects are born way in the past unless cycle length is set to many years. The birthday corresponds to the beginning of the first life cycle, naturally. See ``Object Modification Times'' section for the discussion about object life cycles.

In old Polygraph versions, the birthday was determined using the user-specified birthday distribution and corresponded to the middle of the very first life cycle.

3. Object Modification Times

Polygraph assumes that Web objects or entities have a cyclic life style. That is, modifications happen with certain periodicity. For example, a daily news page may be modified every 24 hours, a personal home page may be stable for a month or so, and a page with old rock group lyrics might remain constant for years. Let's define a cycle as a time period that contains exactly one modification of an object. Then a cycle period is defined as an average cycle length.

We further observe that the period of a cycle is object specific. Modification pattern of a given object is usually stable and often independent from other objects.

Clearly, for many objects, modifications do not happen at constant intervals. Variance field allows you to model variability of object modification times while keeping cycle period constant. The variability is expressed in percents of a cycle period. Zero percent means no variability; all modifications happen exactly at the middle of a cycle. Hundred percent variability means that, for a given cycle, an object may be modified at any time (from the beginning until the end of a cycle). Variability higher than 100% indicates a problem at the simulated server; modification events for an object may appear in the wrong order or in the future (from client's point of view).

The picture below illustrates the object modification model. Note that we show several degrees of modification time variability, but the simulated variability is, of course, constant for a given object.

Object Life Cycle

All objects have (known to Polygraph) last modification times. However, real Web servers often do not include the Last-Modified: entity-header field in HTTP responses. The with_lmt field determines the portion of objects that announce their modification times. For a given object, Polygraph either always includes or always excludes the Last-Modified: field; similar to what a real Web server would do.

To summarize, Polygraph allows you to specify

  1. cycle length distribution (length)
  2. cycle variability (variance)
  3. portion of objects with visible last-modified time (with_lmt)

4. Object Expiration Times

Object expiration time is reported via the Expires: entity-header field. Since Polygraph knows future modification times of objects, it would be very easy to report precise expiration times, reducing the guess work on Web intermediaries. However, having this nice algorithm hard-coded into Polygraph would lead to unrealistic simulations.

Indeed, real Web servers cannot predict future modification times. Hence, in most cases, servers lie about expiration time of objects. A server generates Expires: fields based on several configuration parameters. Usually, there is a way to tell a server to compute the Expires: value according to one of the following two formulas.

See Apache documentation for an example. Note that some servers use ``last access time'' terminology instead of ``current time'', but those are the same times.

Using the formulas above, one can request that an object ``expires'' delta seconds after it was last accessed or modified. The first formula expires all cached copies of a given object at the same absolute time. The second formula expires cached copies when they reach a given ``age'' (after the last revalidation).

Polygraph server implements both formulas using the expires array and various time qualifiers (lmt, now, nmt, etc.). The portion of objects with unknown expiration time is calculated as a portion of objects not covered by the formulas in the expires array.

5. Handling of If-Modified-Since requests

Object modification times are honestly used by Polygraph servers when handling If-Modified-Since (IMS) requests. Since all objects have last modification times, Polygraph can generate an appropriate ``200 OK'' or ``304 Not Modified'' response for any IMS request.

For a given object, the presence of the Last-Modified: field in past replies is irrelevant for the 304 versus 200 reply choice. Furthermore, the presence and value of the Expires: field in past replies is also irrelevant. This behavior mimics real world conditions. See the ``Object Expiration Times'' section for details.

Note that the above assumes that generation of object modification times is enabled using the length field of ObjLifeCycle. Otherwise, Polygraph will reply with a ``200 OK'' response for any IMS request because object's last modification time would be unknown.

6. Examples

Here we give several typical applications of Object Life Cycle model.

6.1 Monthly E-Zine Server

E-Zine content is updated every month with low (2%) variability. Most expiration times are easy and safe to predict. Most (75%) content expires after one cycle, and some objects (say ads, 10%) can be cached for about 1 hour. The rest of the objects, (100-75-10=15%) have unknown expiration time.

ObjLifeCycle olcZine = {
    length = const(30day);
    variance = 2%;
    with_lmt = 90%;
    expires = [ 
        nmt : 75%,                      // next modification time
        now + norm(1hour, 20min) : 10%  // now plus about 1 hour
    ];
};

6.2 News Server

CNN-like server posts hot news and generates revenue by displaying advertisements. Content is updated sporadically (60% variability) with a 2 hour average life cycle. Life cycles differ a lot from object to object (exponential distribution with 2 hour mean is used). Expiration times are mostly unknown (80%) or very conservative.

ObjLifeCycle olcNews = {
    length = exp(2hour);
    variance = 60%;
    with_lmt = 33%;
    expires = [
        lmt + exp(1hour) : 5%,    // half a cycle, on average
        now + const(15min) : 15%, // conservative estimate
    ];
};

6.3 Static Objects

The PolyMix-1 workload used during the first cache-off had a very ``static'' object life cycle configuration. For cachable objects, the time of last modification was set to about one year before the bake-off date. The expiration times was set to about one year after the bake-off date. Here is how that model can be configured using PGL.

ObjLifeCycle olcStatic = {
    length = const(2year);          // two year cycle
    variance = 0%;                  // no variance
    with_lmt = 100%;                // all responses have LMT header
    expires = [nmt + const(0sec)];  // everything expires when modified
};