Web Polygraph 

    Reference Manual
·         Types
       Simulation models
       Command options
       User-defined distr
       Loadable modules
    User Manual


    Wish List
    Mailing List

    Other Tools


Home · Search · Print · Help 


PGL supports many generic and domain-specific types.

AddrMap, AddrScheme, Agent, Bench, BenchSide, Cache, ClientBehavior, Content, DnsResolver, DutState, Goal, Mime, ObjLifeCycle, Phase, PolyMix3As, PolyMix4As, PopDistr, PopModel, Proxy, Robot, Range, SingleRange, MultiRange, Rptmstat, Server, Session, SpreadAs, SrvLb4As, SslWrap, StatSample, TmSzStatSample, HrStatSample, KerberosWrap, AggrStatSample, LevelStatSample, StatsSample, WebAxe4As, addr, DynamicName, array, bool, bwidth, distr, float, int, list, rate, selector, size, Socket, string, time, uniq_id

Detailed descriptions for supported types are given below. Most types are "structures" containing several fields. PGL has no facility to declare new types.

 addr [ ]addresses
 addr [ ]names

AddrMap objects provide mapping of network addresses (domain names or IPs) to IP addresses. The former are usually the addresses that origin servers are visible as (e.g., a VIP address of a L4 switch doing origin server load balancing). The latter are usually IP addresses of simulated server agents.

AddrMap map1 = {
    zone = "hosting.com";
    addresses = '';
    names = 'host[1-10].hosting.com:8080';

The zone field is not used by Polygraph run-time code, but can be used by external programs such as dns_cfg to build zone files based on a PGL configuration file.

Many non-overlapping maps can be use()d in one experiment. The names field may contain IP addresses. The addresses field must contain IP addresses only.

AddrMap vip1 = {
    addresses = '';
    names = '';
AddrMap vip2 = {
    addresses = '';
    names = '';
use(vip1, vip2);

Currently, only 1:1 and 1:N mappings are supported. An unmapped name maps to itself by default (it has to be an IP address in that case, of course).

Needless to say, your DNS server should be able to resolve the names used in your PGL file.

More information about using domain names and configuring your DNS server is available elsewhere.


AddrScheme is a base type for various algorithms that are able to compute agent addresses based on the workload type and bench configuration. There is at least one *As addressing scheme type per each workload that supports automatic address calculation (e.g., PolyMix4As type for PolyMix-4 workload).

The kind field is used as a label to distinguish addressing schemes when the exact scheme type is unknown.

Th following address schemes are supported: SpreadAs, PolyMix4As, WebAxe4As, SrbLb4As, PolyMix3As.

 string [ ]http_versions
 addr [ ]addresses

Agent is a base type for PGL robots, servers, and proxies. In other words, agents have properties common to those three types. Usually, you will not use the agent type directly, but knowing its properties helps in robot and server manipulation.

The kind field is a label used for information purposes only.

The xact_think field determines "transaction think time". Servers "think" after accepting a connection and before reading request headers. Client-side "think time" is not supported in favor of request rate or request interarrival time settings.

The http_versions selector determines agent's HTTP version. Two versions are supported: "1.0" and "1.1". The latter is the default. The selection is sticky for the lifetime of an agent. The version affects protocol version in request-lines of HTTP requests generated by Polygraph robots and status-lines of HTTP responses generated by Polygraph servers. This knob has no effect on other defaults. For example, you still need to explicitly enable persistent connections, even if you are using HTTP/1.1 agents. This knob is available starting with Polygraph version 2.8.0.

An HTTP connection will never have more than pconn_use_lmt requests. Persistent connections are disabled by default. To explicitly disable persistent connections set use limit to const(1). To have virtually no limit on the number of requests per connection, set use limit to const(2147483647). Note that a connection may be closed for reasons other than pconn_use_lmt.

The idle_pconn_tout field specifies the delay after which an idle persistent connection (i.e., a connection with no pending messages) will be closed.

The abort_prob field specifies the probability that an HTTP transaction will be aborted. To abort a transaction, an agent closes the corresponding HTTP connection. At the time of writing, aborts are supported when handling HTTP message bodies only. If the transaction is to be aborted, the agent selects abort offset using a uniform distribution. The transaction is then aborted when at least offset bytes of the message body are read and/or written (from the application point of view). Aborts are not considered errors on aborting side but are likely to look like ones for the agent on the other side of the transaction. Aborted client connections get into a TIME_WAIT state and may exhaust TCP source ports and other resources on untuned client drones.

The addresses field tells Polygraph what IP addresses the agent should bind itself to. Essentially, the agent will duplicate itself to have one self-sustained clone per IP address. An address may be repeated to start several agents (agent clones) bound to the same address.

Pop_model affects various URL selection algorithms. For example, Polygraph robots use this model to select an old URL that should be repeated (to produce a hit). Servers use the model to select old URLs to put in the Location: field of redirection responses (e.g., "302 Found").

The socket field specifies TCP/IP socket options for TCP sockets used by the agent.

The world identifier is used to mark agent-specific URLs or content. Manually setting this field may help to reproduce the exact conditions of past experiments, but there are better ways to do that.

Cookie_sender probability determines the chances that a given Polygraph agent sends cookies. The selection of a cookie-sending status is done at agent start time and is sticky (does not change). HTTP servers send cookies using the Set-Cookie header. HTTP clients (Polygraph robots) send cookies using the Cookie header. Both servers and robots have parameters that further affect cookie handling, but the cookie-sending status is always checked first.

If cookie sending probability is zero, no agents within the given configuration group will send cookies. If cookies sending probability is 50% then roughly half of the agents will be sending cookies (agent-specific parameters permitting).

Cookie sending functionality has been added to Polygraph version 3.0. The default value for Polygraph versions older than 4.0.7 is zero. For Polygraph version 4.0.7 and newer, the default value of the cookie_sender parameter depends on the agent type. For robots, it is 100%. For servers, the default depends on the cookie_set_prob parameter. If cookie_set_prob is set and is positive, the default for cookie_sender is 100%. Otherwise it is zero.

Proxy agents currently ignore all but the addresses field of their parent type.


The Bench maintains information about the benchmarking environment (e.g., the number of physical hosts available for the test) and test parameters such as peak request rate. Information is maintained on a per-side basis.

As any other PGL object, an object of type Bench must appear (directly or indirectly) as an argument of a PGL function or procedure call to be of any affect.

 addr [ ]addr_space
 addr [ ]addresses

BenchSide maintains configuration information about client, server, or proxy side of the bench.

The max_host_load field specifies the maximum load (requests/responses per unit of time) that a physical host should generate/sustain. Given peak_req_rate of the bench, this field determines the number of hosts required for the simulation (on one "side" of the bench).

The max_agent_load field specifies the maximum load (requests/responses per unit of time) that a simulated agent should generate/sustain. Given max_host_load, this field determines the maximum number of agents per host on one "side" of the bench. The actual number of agents depends on the peak_req_rate.

Addr_space defines an array of addresses for various address allocation schemes to pick agent addresses from. For example, a PolyMix-4 addressing scheme may pick the first 500 addresses from the provided space to assign to agents on the first test box. The space addresses often include interface names and subnet information to assist Polygraph in creation of the corresponding IP aliases.

Addr_mask is used by various old address allocation schemes to generate agent addresses. Only the first two octets (aka "network number") of the mask are honored. Use addr_space instead if possible.

The addresses field defines a list of IP aliases that Polygraph should create. These aliases should have the interface name and subnet information. In most cases, this field is not needed as Polygraph can get the same information by concatenating the agent addresses fields. See Run-time address creation section on the Addresses page for more information.


The Cache type is used to configure a proxy cache.

The capacity field specifies the maximum size of the cache. When the sum of content lengths of all cached objects exceeds the configured capacity, some objects may be purged to free space for the incoming traffic. Setting capacity to zero effectively disables the cache.

When set, icp_port instructs the cache object to listen for ICP queries on the specified port and reply to those queries according to the cache contents. At the time of writing, misses are replied with the miss-no-fetch ICP opcode.

Cache admission policy admits every cachable object at most capacity in size. The replacement policy is LRU.

Polygraph allocates about 80 bytes of housekeeping information per cache entry and assumes that average object size is 10KB. It is a good idea to make sure that your benchmarking environment has more than enough memory for the configured cache capacity.

Polygraph cache does not store object content, of course. If needed, "cached" content can be generated from scratch, using the corresponding origin server configuration. This content regeneration is the responsibility of proxy's server side. If you are using the cache, make sure that the origin servers in the PGL proxy configuration file are exactly the same as the origin servers used in the experiment!

 string [ ]req_types
 string [ ]req_methods
 Range [ ]ranges

The ClientBehavior type is used by the client_behavior field of Content objects to configure content-driven Robot behavior. Workloads using content-driven Robots are discussed elsewhere.

ClientBehavior fields are a subset of Robot fields. Please refer to the Robot PGL type reference for their documentation.

In the future, more Robot fields may be added to ClientBehavior. Please submit patches or let developers know if you are interested in particular ClientBehavior properties.

This PGL type is available starting with Polygraph v4.3.0. Support for content-driven recurrence is available since Polygraph v4.4.0.

 Content [ ]may_contain
 string [ ]encodings

The Content type accumulates details about such Web object properties as MIME type, size, cachability, etc.

The checksum field specifies probability that an entity will have an MD5 checksum computed and attached to the response using HTTP Content-MD5 header field. For all HTTP responses with Content-MD5 headers, Robots calculate an MD5 checksum from scratch and compare it with the value in the header. Mismatches are reported as errors. Since MD5 computation is CPU-intensive, setting the checksum field to high values may slow down server and client processes. Please note that standard MD5 algorithm (no secret salt) is used and that Robots trust the received Content-MD5 headers. Thus, an intermediary can attach its own header to cause verification on the client side or can alter the content and the header to avoid checksum mismatch errors. Using checksum may be useful when a proxy is suspected of accidently (unknowingly) altering the content.

The recurrence field is ignored. Use bhr_discrimination setting of the popularity model instead.

The may_contain field specifies embedded types that the content type may contain. For example, HTML objects may contain various images and audio files.

The embedded_obj_cnt distribution is used to determine the number of embedded objects in the container of the corresponding content type.

Several content options deal with simulating realistic content using Polygraph's CSM model. The content_db field specifies the filename of the content database (a file produced with the cdb tool). Inject_db holds the name of the file where the strings to be injected into the generated content are stored. Individual injections appear approximately inject_gap apart if possible. Infect_prob specifies probability that a generated object will be infected (i.e., will contain at least one injection).

The encodings strings specify supported content codings and are used for enabling content compression features.

The client_behavior object specifies expected Robot behavior when this content is selected. This field is available starting with Polygraph v4.3.0.

 addr [ ]servers

When a Polygraph agent has to resolve a domain name, it contacts DNS servers based on the DnsResolver information.

The servers field contains DNS servers to contact.

The timeout field specifies the maximum delay after which a still unacknowledged DNS query is considered failed.


The DutState objects are used as a part of conditional calls in the Watchdog feature. The latter is described elsewhere.

The rptm_min and rptm_max fields contain minimum and maximum levels for measured mean response time.

Fill_size_min and fill_size_max fields contain minimum and maximum levels for cumulative fill size (volume).

Xactions_min and xactions_max fields contain minimum and maximum levels for cumulative transaction counts.

Rep_rate_min and rep_rate_max fields contain minimum and maximum levels for averaged measured response rate.

Errors_min and errors_max fields contain minimum and maximum levels for cumulative number of errors.

Error_ratio_min and error_ratio_max fields contain minimum and maximum levels for average error ratio.

Dhr_min and dhr_max fields contain minimum and maximum levels for average document hit ratio.


Goal specifies one or more simulation goals for a given phase. Individual sub-goals are ORed together. That is, reaching one sub-goal is enough to reach the entire goal.

All sub-goals except errors are called "positive" sub-goals. Specifying errors or a "negative" sub-goal is somewhat tricky. If errors value is less than 1.0 than it is treated as error ratio. Otherwise, it is treated as error count. For example, a value of 0.03 would mean that getting at least 3% of errors is enough to reach the goal, while the value of 3 would mean that at least 3 errors are enough.

 string [ ]prefixes
 string [ ]extensions

Mime type groups together Web object properties related to MIME standard. Properties related to URL path generation are also encapsulated in the Mime type, but that is likely to change.

The type field specifies the string to be used for the Content-Type: HTTP header.

Strings from the prefixes array are appended (with a specified probability) to the address part of the URL, before the start of Polygraph-specific URL path. The prefix string is always prepended with a slash character. However, no special delimiter is used between the prefix and URL path; a delimiter (if any) must be a part of the prefix string (e.g., "images/").

Strings from the extensions array are appended (with a specified probability) to the Polygraph-specific URL path. No special delimiter is used to append an extension; a delimiter (if any) must be a part of the extension string (e.g., ".html").

 time_distr [ ]expires

ObjLifeCycle specifies the parameters for the Object Life Cycle model. Here is a sample configuration.

ObjLifeCycle olc = {
    length = logn(7day, 1day);      // heavy tail, weekly updates
    variance = 33%;                 // highly unpredictable updates
    with_lmt = 100%;                // all responses have LMT
    expires = [nmt + const(0sec)];  // everything expires when modified

See the distribution type for a list of supported qualifiers for time distributions (lmt, now, nmt, etc.).

The birthday field is ignored in recent Polygraph versions.

 StatsSample [ ]stats_samples

Most Polygraph measurements are collected on a phase basis. Phases also allow to vary the overall load and other "global" characteristics to model complex workload patterns.

Phase name is used for informational purposes only. Do not use name "All" which is an lx macro that stands for "all phases". Also, if you are going to make graphs based on console output (rather than binary logs), you want to avoid phase names with whitespaces. The latter will effectively change the number of columns in console stats lines and confuse plotting tools.

Phase goal specifies the duration of the phase and/or other phase termination conditions.

Populus factors affect the number of robots alive. Population size can be varied from 0% to 100%, relative to the total number of individual robots configured for the test. The latter is determined as the total number of addresses of all use()d robots. Note that a live robot can be idle or busy, depending on its session configuration and state. Polygraph can vary population size starting with version 2.7.0.

Load factors affect the load generated by Polygraph robots. Load level can be varied from 0% to 100% and beyond, relative to the load generated by an individual robot. In other words, load factor tells each robot to adjust its activity accordingly. Varying robot population size is preferred to varying robot load levels as it produces more realistic workloads.

Other factors behave in a similar fashion. Recur_factor is applied to the recurrence_ratio of a Robot. Special_req_factor is applied to the portion of "special requests" such as "IMS" or "Reload". The latter can be specified using the "req_type" field of a robot.

If factor_beg is not equal to factor_end, then the current factor is adjusted linearly during the phase. That is, the factor is increased(decreased) from factor_beg to factor_end. Such adjustments require a positive phase goal.

There are a couple of simple "factor preservation" rules that make load factors easy to specify. All these rules apply only when a factor is not explicitly defined.

  • For undefined factor_beg, use factor_end of the previous phase.
  • For undefined factor_end, use factor_beg of the current phase.
  • If a factor is still undefined, it is set to 100%.

These rules eliminate repetitions of factor entries for consecutive phases. Only changes in load levels have to be specified.

The log_stats flag tells Polygraph whether statistics collected during the phase should be recorded in a log file. This flag defaults to true.

The primary flag tells Polygraph reporter tool whether the phase should be used for the executive summary and the baseline report. If any of the scheduled phases have this flag set, then those phases and only those phases are used for the executive summary and the baseline report. This flag defaults to false. It can be overwritten by Reporter's --phases command-line option. The primary flag has been supported since Polygraph version 4.0.6.


PolyMix3As type represents addressing scheme for PolyMix-3 workload.


PolyMix4As type represents addressing scheme for the PolyMix-4 workload.

The number of PolyMix-4 hosts and robots is determined by the peak request rate. The total number of robots (servers) is adjusted so that every client- (server-) side host has the same number of agents. Other minor adjustments are also made.

To allocate IP addresses for robots, Polygraph iterates through the client-side addr_space array and gives the next robot the next IP address, until enough IP addresses are allocated for a host. Polygraph then skips remaining IP addresses that belong to the same subnet (if any), and starts allocation for the next host (if any).

The above scheme ensures that individual IPs do not "migrate" from one host to another when the request rate changes. Instead, only the number of IPs "enabled" on each host changes.

Server-side IP allocation algorithm is very similar to the client-side algorithm described above. The only significant difference is that the total number of server agents is computed as 500 + 0.1*R, where R is the total number of robots.


The PopDistr type is similar to the distribution type. Popularity distribution specifies how to select the next object to be requested from a group of objects that were requested before. In other words, it specifies which objects are more popular than others (i.e., requested more often) within a certain group of objects.

PopModel R;
R.pop_distr = popZipf(0.6);

The following popularity distributions are supported.

  • popUnif() -- Uniform: all objects have equal chance of being selected
  • popZipf(skew_factor) -- Zipf: zipf-like power law with the specified skew

Popularity model specifies how to select the next object to be requested among all objects that were requested before. In other words, it specifies which objects are more popular than others (i.e., requested more often).

The selection of the object to be requested is done in three stages. First, Polygraph determines whether the object should come from a "hot set". That decision is positive with a probability specified by the hot_set_prob field.

During the second step, the popularity distribution specified by the pop_distr field is used to select a particular object. If the object is selected among "hot" objects, the selection is limited by the hot set size. Otherwise, the entire working set is used. The hot set size is a fraction of the current working set size specified by the hot_set_frac field.

Finally, a byte hit ratio (BHR) discrimination algorithm is applied with bhr_discrimination probability. The algorithm selects the object with the smallest size among at most nine objects centered around the selection made at the second stage. Uncachable objects are ignored during the selection. Moreover, the algorithm does nothing when the second stage selects an uncachable object. Thus, configured content type cachability ratio is not affected, and uncachable objects should have the same recurrence ratio regardless of their size. Without the discrimination algorithm, offered BHR would be about the same as offered document hit ratio (DHR) while real BHR is usually some 30%-40% lower than DHR. The BHR discrimination algorithm was introduced in version 2.7.2 of Polygraph.

PopModel popModel = {
    pop_distr = popUnif();
    hot_set_frac =  1%;  // hot set is 1/100th of the working set size
    hot_set_prob = 10%;  // every 10th object is requested from the hot set

    bhr_discrimination = 90%; // revisit smaller files more often
Robot R;
R.pop_model = popModel;

Proxy agent simulates a proxy cache. The client side (i.e., the side that sends requests to and receives replies from the servers) is configured using a Robot agent. Similarly, the server side (i.e., the side that receives requests from and sends replies to clients) is configured using a Server agent. A proxy may also have a cache to store some of the proxied traffic.

The client side attempts to cache every cachable object it fetches. The server side attempts to resolve every request from the cache. See the Cache type description for important caveats of using the cache.

There is no direct connection between ICP ports of the client side and the cache (Robot and Cache types for the descriptions of those fields). However, in most cases, these two ports should be set to the same value because a real proxy usually sends and receives ICP queries using the same UDP port.

Note that the addresses field of the proxy agent overwrites the addresses fields of client and server configurations. Other fields inherited from the Agent type are currently ignored. The latter is a bug.

Proxies are activated by the polypxy program.

 addr [ ]origins
 addr [ ]http_proxies
 addr [ ]ftp_proxies
 addr [ ]proxies
 addr [ ]socks_proxies
 string [ ]interests
 string [ ]req_types
 string [ ]req_methods
 string [ ]user_names
 string [ ]accept_content_encodings
 Range [ ]ranges

Derived from the Agent type, robot (a.k.a. "user" or "client") is the main logical thread of execution in polyclt. Robots submit requests and receive replies. The frequency and nature of the submissions depend on the workload.

The origins field lists names or addresses of origin servers to be contacted.

The http_proxies and ftp_proxies fields list addresses of proxies to send the requests through. Both domain names and IP addresses are acceptable. A port number must be specified for each proxy. These fields are available since Polygraph v4.0.4. Earlier versions use the deprecated proxies field.

Polygraph supports HTTP proxies only. If a proxy is used for a request, then robot-proxy communication uses HTTP protocol. When proxied, requests for FTP servers use an ftp:// Request-URI scheme and are sent through ftp_proxies (if configured) or http_proxies (default). Requests for HTTP servers use http:// scheme and are always proxied through http_proxies.

The proxy selection algorithm mimics typical browser configuration and behavior:

  • Requests for HTTP origin servers:
    1. If http_proxies is set and is not empty (or proxy command line option is given), then the requests go through the specified proxies.
    2. Otherwise, the requests go direct.
  • Requests for FTP origin servers:
    1. If ftp_proxies is set and is not empty, the requests go through the specified proxies.
    2. If ftp_proxies is set but is empty, the requests go direct (using FTP protocol, of course).
    3. If ftp_proxies is not set, then the above rules for HTTP origin server requests apply.

For each Request-URI scheme (i.e., each origin server protocol), a robot selects a random proxy at the configuration time and uses that single proxy for the entire duration of the test (sticky proxy assignment). Within a scheme, proxy addresses are evenly distributed among all robots in the test (if possible). Individual groups of robots (e.g., all robots on one host) may not get an even distribution.

The http_proxies field is mutually exclusive with the proxy command-line option. Both have the same semantics except the command-line option cannot specify more than one proxy address.

The proxies field is deprecated in favor of (and has the same semantics as) the http_proxies field (since Polygraph v4.0.4).

The socks_proxies field lists addresses of SOCKS proxies to send the requests through, similar to http_proxies and ftp_proxies. Only SOCKS5 proxies are supported. SOCKS proxies are supported for HTTP and passive FTP requests. Active FTP data connections do not go through a SOCKS proxy at the moment. SOCKS support is available since v4.1.0.

The socks_prob parameter specifies probability that a Robot would use a SOCKS proxy. The decision to use SOCKS and the SOCKS proxy selection are sticky. The parameter defaults to 1.0 for Robots with non-empty socks_proxies array.

The socks_chaining_prob parameter specifies probability of a SOCKS-using Robot also using an HTTP or FTP proxy. In this case, Robot requests go through the selected SOCKS proxy first and then through the selected HTTP or FTP proxy. As with other proxy-related parameters the proxy chaining decision is sticky. By default, proxies are not chained.

When req_rate is specified, a robot will emit a Poisson request stream with the specified mean rate, subject to phase load levels. The req_inter_arrival field can be used to specify request arrival stream different from Poisson. Naturally, the two fields are exclusive.

If neither of req_rate or req_inter_arrival are set, a Robot will use the "best effort" approach, submitting next request immediately after a reply to the previous request has been received.

Recurrence ratio is simply how often a robot should re-visit a URL. In other words, how often a robot should request an object that was accessed before (possibly by other robots). Note that recurrence ratio is usually higher than hit ratio because many objects are uncachable and repetitive requests to uncachable objects do not result in a hit.

The embed_recur field specifies the probability of requesting an embedded object when the reference to the latter is found in the response.

Public_interest ratio specifies how often a robot would request a URL that is "known" to (and can be requested by) other robots. Robots are usually independent from each other in their actions. However, they may access same objects on the same servers. If public_interest is zero, a robot would request only "private" objects from all origin servers, resulting in no overlap of URL sets requested by individual robots. Note that both public and private objects can be requested more than once and hence produce a hit. This field has been removed starting with Polygraph version 2.8.0 in favor of a more general interests field documented below.

Interests selector configures Robot interest in URL worlds. Three kinds of worlds are supported: private, public, and foreign. These kinds can be mixed freely, but non-foreign interest is required for phase synchronization to work. Public worlds interest specifies how often a robot would request a Polygraph-generated URL that is "known" to (and can be requested by) other robots. Robots are usually independent from each other in their actions. However, they may access same objects on the same servers. If private interest is 100% (which is the default), a robot would request only "private" objects from all origin servers, resulting in no overlap of generated URL sets requested by individual robots. Finally, foreign interest specifies the portion of URLs that should come from Robot's foreign_trace. Note that public, private, and foreign objects can be requested more than once and hence produce a hit. This field replaced less general public_interest field starting with Polygraph version 2.8.0.

Robot R = {
    // public_interest = 75%;
    interests = [ "foreign": 1%, "public": 74%, "private" ];
    foreign_trace = "/usr/local/traces/special_sites.urls";

The req_types array specifies what kind of requests the robot should emit and with what probability. Several request types are supported: "Basic" (a common GET request), "IMS" (a request with an If-Modified-Since header field), "Reload" (a request with a Pragma: no-cache and Cache-Control: no-cache header fields), "Range" (a request with a Range header field), and "Upload" (an HTTP PUT or an FTP STOR request).

The req_methods array specifies HTTP request methods the robot should use and with what probability. Several methods are supported: "GET" (default), "HEAD", "POST", and "PUT". Request methods are somewhat orthogonal to request types. For example, an IMS request may be issued using HEAD request method. Polygraph may not support all combinations though. Please see the "Request properties" section in the "Traffic generation" reference page for more information.

The private_cache_cap field specifies the size of the robot cache. Robots do not cache object content, but remember URLs and other object characteristics. For example, when IMS request is generated, the IMS timestamp is taken from the robot cache if possible.

Pop_model specifies which "popularity model" to use when requesting an object that has already been requested before. You must specify popularity model if you specify positive recurrence ratio.

When unique_urls flag is set, each request submitted by polygraph will be for a different URL. Note that this option is applied last and changes a URL without affecting the object id part. Object IDs are responsible for generating various object properties. Thus, for filling-the-cache experiments, it may be a good idea to use this option (in conjunction with other options like recurrence and public_interest) to generate objects similar to production tests (but with zero hit ratio).

The pipeline_depth distribution determines the maximum number of concurrent outstanding requests on a persistent connection. Request is considered outstanding until the corresponding response is completely received. By default, requests are not pipelined, as if const(1) value was specified for the pipeline depth. Pipeline depth knob has no effect on connection persistency and actual depth depends on factors such as connection persistency and presence of embedded objects. See traffic model for more details about request pipelining. Pipelining is supported in Polygraph starting with version 3.0.

Open_conn_lmt is the maximum number of open connections (in any state, to any server) a robot may have at any given time. A robot will postpone (queue) new transactions if the limit is reached. This limit simulates typical behavior of browsers like Netscape Communicator that have a hard limit on the total number of open connections. See Pei Cao's experimental study for more information.

Wait_xact_lmt is only useful when open_conn_lmt is specified. If the robot reaches its open connections limit, it will queue the extra transactions. When the queue length grows beyond wait_xact_lmt, new transactions will be simply ignored (with an appropriate error message).

Minimize_new_conn is the probability that a robot will treat connections to substitute addresses as connections to the same agent (and, hence, reuse them if needed). This is useful for running various no-proxy or no-VIP tests while keeping the number of persistent connections similar to a "proxied" environment.

The session field is useful for simulation of the login/out behavior of many Web clients, including browsing humans. See Session type for more information.

User_names do not affect robot behavior but may be useful for testing external accounting and authentication services. Each name is just a string. A robot picks a new name at the start of the session. Within one robot configuration, no two sessions share a name, provided all configured names are unique and there are enough names (i.e., the number of user names is at least the number of robot addresses). Names are selected in random order, with equal probability.

The peer_icp address enables ICP module of the robot; the robot will send ICP queries for all to-be-requested objects from the icp_port to that address. The peer_http address specifies where to send HTTP queries after an ICP peer returns a hit.

Note that if only peer_icp address is set, the robot will send ICP queries to the specified address, but will not fetch objects from a peer. Setting peer_http only may not be supported, use the "--proxy" option instead. At most one ICP and at most one HTTP peer can be configured. Using completely different addresses for the two peers is allowed, but usually does not make sense.

The dns_resolver field specifies the DNS resolver for a robot to use.

The foreign_trace specifies the name of a file that contains absolute HTTP URLs to request when foreign interest is selected according to the interests field. The trace file must have one URL per line. HTML anchors (or #-comments) are stripped. Whitespace at the beginning and at the end of a line is stripped. Empty lines are ignored. All URLs are pre-loaded at the start of a test. Thus, larger traces will require more RAM. Misses are generated in trace order. Once all URLs in a trace are requested, the iteration start from the top of a trace. The trace order has no influence on hit generation. However, Polygraph assumes and does not check for URL uniqueness, and duplicate trace entries may cause unexpected (for Polygraph) hits.

The cookies_keep_lmt distribution determines the maximum number of origin and foreign server cookies that a robot can remember and keep. When the number of incoming cookies exceeds the specified limit, the Robot removes old cookies in a FIFO order. By default, 4 cookies will be kept for each server. A robot will send back all cookies it remembers, if any (provided the robot is a cookie-sending agent, of course).

Prior to Polygraph version 4.0.7, all robots within a polyclt process share the cookie storage and the FIFO queue. Since Polygraph version 4.0.7, each robot has a separate cookie storage and queue. Cookie sending functionality has been added to Polygraph version 3.0.

The accept_content_encodings strings specify content codings to be listed in an HTTP Accept-Encoding request header. This knob is used to trigger content compression at the server.

The spnego_auth_ratio controls the choice of the algorithm for NTLM or Negotiate authentication. If unset or set to zero, NTLMSSP algorithm will be used. Otherwise, the corresponding portion of authentications will be done using SPNEGO (a.k.a., GSSAPI) algorithm.

The kerberos_wrap field provides parameters for Kerberos authentication. Please see the User Manual for details.

The ranges selector specifies what ranges the robot should use when generating a "Range" request and with what probability. Please see the Ranges manual for more information.

The req_body_pause_prob parameter specifies the probability of a paused request. A paused request is the request with an "Expect: 100-continue" header. After sending a paused request, the robot waits for an HTTP 100 "Continue" control message from the server or the final HTTP 417 "Expectation Failed" response. The default is not to pause requests. This option is mutually exclusive with the req_body_pause_start option described below. Please see the Request Bodies manual page for more information.

The req_body_pause_start parameter specifies the minimum size of a paused request (see req_body_pause_prob above for terminology and implications). Requests with bodies smaller than the specified size are not paused. The default is not to pause any requests. This option is mutually exclusive with the req_body_pause_prob option described above. Please see the Request Bodies manual page for more information.

The passive_ftp parameter specifies the probability of the Robot using passive FTP mode. In passive mode, a robot sends the FTP PASV command and receives a server address; the robot connects to the server address, establishing a data channel. In active mode, a robot sends the FTP PORT command with the robot address for the data channel; the server connects to the robot address. The FTP mode selection is sticky for the lifetime of the robot. All robots use passive FTP by default. Passive FTP support is available in Polygraph since v4.0.0; active since v4.0.7.


Range is a base type for PGL SingleRange and MultiRange types. You should not use the Range type directly.


The SingleRange type is used to configure a single range request. For more information, please see the ranges manual.

The first_byte_pos_absolute and first_byte_relative fields are absolute (in bytes) and relative (in percentage of whole entity size) positions of the first range byte.

The last_byte_absolute and last_byte_relative fields are absolute (in bytes) and relative (in percentage of whole entity size) positions of the last range byte.

The suffix_length_absolute and suffix_length_relative fields are absolute (in bytes) and relative (in percentage of whole entity size) sizes of the requested range suffix.

The *_absolute fields are mutually exclusive with the *_relative fields. The byte-fields are mutually exclusive with the suffix-fields, just like in the RFC 2616 BNF.


The MultiRange type is used to configure a request with a multi-spec Range header. For more information, please see the ranges manual.

The first_range_start_absolute and first_range_start_relative fields are distributions of absolute (in bytes) and relative (in percentage of whole entity size) positions of the first byte of the first range spec. These fields are optional.

The range_length_absolute and range_length_relative fields are distributions of absolute (in bytes) and relative (in percentage of whole entity) sizes of an individual range spec.

The range_count distribution is used to determine the number of individual range specs.

The *_absolute fields are mutually exclusive with the *_relative fields.


Rptm-stat is to response time what thermo-stat is to temperature in the room. Rptmstat specifies an "acceptable" response time range (from rptm_min to rptm_max) and the factor change percentage that should be applied to the current load factor if mean response time in a sample is outside of the given range.

For "flat" phases (i.e., phases with load_factor_beg equal to load_factor_end), the current load factor will be increased or decreased by load_delta percentage depending whether response time is lower or higher than acceptable.

For phases with variable configured load factor, the slope of the factor curve will be increased or decreased by load_delta. However, current load factor will never become lower than load_factor_beg or exceed load_factor_end!

The sample_dur field sets the sample duration or "size". Samples follow each other without overlaps.

 Content [ ]contents
 Content [ ]direct_access
 string [ ]rep_types

Derived from the Agent type, server is the main logical thread of execution in polysrv that models an HTTP origin server. Servers receive requests and send replies. The speed and nature of the replies depend on the workload.

Accept_lmt specifies the limit for consecutive attempts to accept(2) an incoming connection. The attempts are terminated with the first un-successful accept(2) system call or when the limit is reached. By default, there is not limit.

Contents field is a content selector. It specifies the distribution (or relative popularity) of content types for the server. Each content type must be "accessible". That is, each type must be in the closure of the direct_access selector described below.

Direct_access array specifies what content types can be accessed directly (i.e., not as an embedded object) by a robot. The configuration below describes a simplified relationship among the three most popular content types.

#include "contents.pg"
Server S = {
    contents      = [ cntImage : 70%, cntHTML : 10%, cntOther ];
    direct_access = [ cntHTML : 95%, cntOther ];

The rep_types array specifies what kind of replies the server should emit and with what probability. Two reply types can be specified: "Basic" and "302 Found". "Basic" corresponds to "200 OK" or "304 Not Modified", as appropriate depending on the actual request.

The cookie_set_prob probability determines the portion of HTTP responses for which the server will attempt to generate cookies (provided the server is a cookie-sending agent, of course). If cookies need to be generated, the cookie_set_count distribution is used to determine the number of cookies in the response, and the cookie_value_size distribution is used to determine the sizes of individual cookie values. Each cookie gets its own Set-Cookie header field. Cookie values are random quoted strings with sessN cookie names. Cookies do not expire and do not have explicit paths. Polygraph robots may return cookies depending on client-side cookie-related options. Cookie sending functionality has been added to Polygraph version 3.0.

The req_body_allowed parameter specifies the probability that the server "allows" a "paused" request by responding with an HTTP 100 "Continue" control message to a request with an Expect: 100-continue header. The default is 100% (i.e., allow all paused requests). Please see the Request Bodies manual page and the Robot req_body_pause_prob field for more information.

 string [ ]rep_types

Session objects are used to configure robot behavior. A single session consists of two periods: busy and idle. During the busy period, a robot behaves normally, as if no sessions were configured. At the start of an idle period, a robot clears all request queues. Robot does not emit new requests during the idle period, but may finish some outstanding transactions.

Robot R = {
    session.busy_period.duration = 7sec;
    session.idle_period_duration = exp(3sec);
    session.heartbeat_notif_rate = 1/2sec;

In the example above, the durations of busy and idle periods are set to 7 seconds (constant) and 3sec (exponentially distributed; new value is selected when a session starts). Thus, the total session duration would be 10 seconds, on average.

Busy_period is of type Goal so that you can specify busy period duration based on, say, the number of transactions and not just time. Idle period duration is of type "time distribution". One cannot use distributions with Goal members, but let us know if you need this feature.

A non-idle session can be configured to emit "heartbeat" notification events at a specified rate. The above example will emit one heartbeat event every 2 seconds. These events have no effect on robot behavior, but are useful for forwarding session events to external remote programs via Polygraph Doorman feature.

Heartbeat_notif_rate field was named heartbit_notif_rate in Polygraph version 2.7.0.


SpreadAs type represents addressing scheme called Spread. It is possibly the simplest addressing scheme that distributes the load evenly across all bench hosts.

Spread takes H, the number of configured hosts for the bench side, and devides the entire address space into H partitions of equal size. Iterating over partitions, Spread takes one remaining agent IP address from the current partition per iteration, until the total accumulated request rate produced by selected agents reaches the configured total request rate for the bench (Bench::peak_req_rate).

For example, the following configuration will result in all three client-side hosts utilized, each with 50 alias IP addresses and 100 robots, producing 300/sec total load:

Bench B = {
    peak_req_rate = 300/sec;
    client_side = {
        max_agent_load = 1/sec; // estimated load produced by one Robot
        addr_space = [ 'lo::10.0.1-6.1-250/32' ]; // 1500 IPs to partition
        hosts = [ '' ]; // three client-side hosts or partitions
    server_side = { ... };

SpreadAs asSpread = { agents_per_addr = 2; };

Robot R = {
    R.addresses = robotAddrs(asSpread, B); // calculates Robot IP aliases

In the above example, the first host ( gets IP aliases created even though its address space partition contains 500 IPs (10.0.1-2.1-250). The second host gets, and the third will see IPs. There are only 50 IP aliases per host because asSpread constant uses two robots per IP address and it needed enough IP aliases to support 100/sec rate per host.

Keeping the number of hosts and the address space constant allows you to setup stable routes for each host while varying request rate from nearly zero to the maximum level supported by the bench. Higher request rate means more IPs selected from each address space partition, but the partitions themselves remain constant.

Spread distributes Server addresses the same way as Robot addresses, except that agents_per_addr is always assumed to be equal to 1 for the servers.


SrvLb4As type represents addressing scheme for SrvLB-L7-4 and SrvLB-L4-4 workloads.

Robot and server address allocation algorithm is the same as for WebAxe4As scheme.

 string [ ]protocols
 string [ ]ciphers
 size [ ]rsa_key_sizes

SslWrap object describes SSL connection properties. SslWraps are used in Agent configurations to indicate and find-tune support for SSL. Please see the SSL layer manual for more information.

The protocols field specifies supported SSL protocol names. An agent selects the protocol or protocol set at startup time, and that selection is sticky. Default is "any" which stands for "SSLv23" in OpenSSL terminology.

The root_certificate field specifies location of the root (CA) certificate file. That file is needed for the servers to generate their certificates and for the robots to verify server certificates. If not defined, the servers will generate self-signed certificates and the robots will not check server certificates. Appending public certificates to the root_certificate file allows robots to trust those certificates (and/or certificates signed by them) as well; any public certificates present do not affect certificate generation by the server.

The ciphers field selects ciphers the agent will use. The selection is sticky. By default all ciphers are selected.

The rsa_key_sizes array specifies supported key lengths to use when auto-generating a private server key. Server's selection is sticky. Defaults to 1024 bits.

The session_resumption parameter enables and controls the session caching and resumption algorithms. It should be used together with session_cache. By default session_resumption is 0%.

The session_cache parameter controls the size of the session cache for session caching and resumption feature. By default session_resumption is 0.

The sharing_group enables and configures recycling or sharing of SSL certificates. Certificates within the same group will be shared (i.e., will only be generated once) across SslWraps and agents if their OpenSSL generation commands are exactly the same. Sharing hurts realism but provides significant speedup in Polygraph start times when hundreds of servers require certificate generation.

The ssl_config_file parameter sets the OpenSSL configuration file. Relative file names are rooted in the directory from where the Polygraph program was started. If no parameter is given, the 'myssl.conf' file name is used, with a warning that such usage is deprecated. The ssl_config_file parameter is supported since v3.6.0.

The verify_peer_certificate parameter controls whether Robots do peer certificate verification. By default, Robots verify certificates if and only if root_certificate is specified. Note that servers do not currently verify client certificates. This knob is available since Polygraph v4.0.10.


StatSample objects are useful in the context of Polygraph Watchdog feature. Each object provides read-only access to performance measurements collected during a watchdog sampling period or a phase. Dozens of measurements are available.

Most StatSample structure members are structures themselves. See their corresponding types linked above for detales on individual members. Paragraphs below define top-level member meaning only.

Req.rate is the offered request rate.

Rep.rate is the measured response rate.

Rep is statistics collected for all kinds of HTTP transactions.

Basic is statistics collected for basic HTTP transactions. A basic HTTP transaction is a transaction for which the definition or meaning of a hit is relatively obvious. This excludes transactions with the following characteristics: non-GET request methods, If-Modified-Since request headers, response status codes other than 200 or 304, reloads, and aborted I/Os.

Offered is hit/miss statistics for offered hits and misses. An HTTP request "offers" a hit if an ideal cache would most likely return a cached copy in response. Only basic transactions are used for this statistics.

Real is hit/miss statistics for real (i.e., actual or measured) hits and misses. These stats are based on a client-side guess when a proxy did not contact a server to produce a response. A guess may be inaccurate when the proxy contacts the server but uses the old response headers instead of forwarding the new ones. Only basic transactions are used for this statistics.

Cachable is cachability statistics for basic transactions.

Fill is statistics for cachable real misses.

Redired_req is statistics for HTTP transactions involving redirected responses such as 302 (Found). Such transactions are not basic transactions.

Rep_to_redir is statistics for transactions caused by earlier redirected responses.

Ims is statistics for transactions involving an HTTP request with an If-Modified-Since request header. Such transactions are not basic transactions.

Reload is statistics for transaction involving client "reload" requests (HTTP requests with Cache-control: no-cache directive). Such transactions are not basic transactions.

Head is statistics for transactions involving a HEAD request. Such transactions are not basic transactions.

Post is statistics for transactions involving a POST request. Such transactions are not basic transactions.

Put is statistics for transactions involving a PUT request. Such transactions are not basic transactions.

Abort is statistics for HTTP transactions where either request or response was intentionally aborted prematurely, due to positive abort_prob setting of an Agent. Such transactions are not basic transactions.

Xact is concurrency level statistics for all HTTP transactions.

Populus is concurrency level statistics for robots.

Wait is concurrency level statistics for HTTP requests waiting (for available connection slot) to be submitted. See open_conn_lmt setting of a Robot.

Conn.open is concurrency level for open HTTP/TCP connections. A connection is considered "open" from right after the corresponding connect(2) or accept(2) system call and until the close(2) system call.

Conn.estb is concurrency level for established HTTP/TCP connections. A connection is considered "established" if it is open and was marked as "ready for I/O" by an operating system. This usually means that the TCP handshake has succeeded for the connection.

Conn.ttl is time-to-live statistics for open connection. That is, it is the measure of how long connections stay open.

Conn.use statistics counts the number of HTTP transactions per connection. If persistent connections are disabled, all connections will have just one "use" count.

Ok_xact.count is the number of successful transactions.

Err_xact.ratio is the ratio of failed to successful transactions.

Err_xact.count is the number of failed transactions.

Retr_xact.count is the number of retried transactions. Transactions are retried if the request is aborted due to a race conflict with persistent HTTP connections.

Duration is the time it took to collect the sample, from the first collected datapoint to the last.

Warning: Do not confuse StatSample with StatsSample. The latter is likely to be removed from PGL.


TmSzStatSample objects encapsulate response time (rptm)- and size-based statistics for a given measurement. They can only be used as a part of a StatSample object.


HrStatSample objects encapsulate "hit" ratio statistics for a given measurement. They can only be used as a part of a StatSample object. Note that "hit" and "miss" terms may be changed to names of some other disjoint classes, depending on the measurement. For example, "yes" and "no" is used for cachability statistics.

Ratio.obj is a count-based ratio for a given transaction or content class. For example, actual document hit ratio (DHR) is real.ratio.obj

Ratio.byte is a volume-based ratio for a given transaction or content class. For example, actual byte hit ratio (BHR) is real.ratio.byte

Hit is statistics for transactions that were classified as those matching the HrStatSample criteria. For example, hit transactions for the real hit ratio statistics.

Miss is statistics for transactions that were classified as those not matching the HrStatSample criteria. For example, miss transactions for the real hit ratio statistics.

 addr [ ]servers
 addr [ ]servers_udp
 addr [ ]servers_tcp

KerberosWrap configures Kerberos authentication. Please see the User Manual for details.

The realm string specifies the Kerberos realm part of the service principal (i.e., "HTTP/<proxy-address>@realm"). It is also used for the client principal if robot credentials do not specify a realm. This field is required.

The servers field specifies KDC server addresses. At least one address is required. If Polygraph fails to communicate with a KDC server, it tries the next server address. Use this field if (and only if) your robots should try using UDP first, and both UDP and TCP listening addresses are the same across all KDC servers. Otherwise, use servers_tcp and/or servers_udp instead. Please see the User Manual for more information.

The servers_tcp field specifies TCP-specific KDC addresses. It is mutually exclusive with the servers field but has similar semantics.

The servers_udp field specifies UDP-specific KDC addresses. It is mutually exclusive with the servers field but has similar semantics.

The timeout parameter limits the time spent waiting for a KDC reply. After a timeout, the robot will usually try another KDC server (if any). There is no wait limit by default.


AggrStatSample objects contains aggregate statistics for a given measurement. They can only be used as a part of a StatSample object.

Count is the number of measurements taken.

Mean is the arithmetic mean of all measurements taken (i.e., sum/count).

Min is the value of the smallest measurement taken.

Max is the value of the largest measurement taken.

Std_dev is the standard deviation of all measurements taken.

Rel_dev is the relative deviation of all measurements taken (i.e., std_dev/mean).

Sum is the sum of all measurements taken.


LevelStatSample objects contains level statistics for a given set of concurrent events. They can only be used as a part of a StatSample object.

Started is the number of started events (including those that ended).

Finished is the number of finished events.

Level.mean is the mean level of started but not finished (pending) events during the measurement period. This statistics is not very reliable, probably due to problems with the way the level is computed. Polygraph essentially computes an integral of the measurement function over the measurement period and then divides the computed space by the duration of the period. This algorithm is either incorrect or implementation is buggy, leading to surprising results in some tests.

Level.last is the number of not yet finished events at the end of the measurement period. For short periods, this statistics should be used instead of the level.mean until the latter is fixed.


Warning: Do not confuse StatsSample with StatSample. The former is likely to be removed from PGL.

Use StatsSample objects to instruct Polygraph to collect detailed samples of transactions.

The name field is just a label to identify a sample.

The start field specifies the delay since the beginning of the test after which Polygraph will start collecting a sample.

Capacity determines the number of transactions in the sample.

If samples overlap, the earlier sample(s) are forced to "close", and the sample started last will get all the transactions.

At the time of writing, there are no tools to extract collected samples from binary logs.


WebAxe4As type represents addressing scheme for the WebAxe-4 workload.

Robot and proxy address allocation algorithm is the same as for PolyMix4As scheme.

Server-side IP addresses are set to the real addresses of the server-side PCs.


Network addresses are represented using the addr type. The addresses can store IPv4, IPv6, or FQDN information along with an optional network interface name, port number, and subnet. Address constants are usually specified using 'single quoted strings' as shown below.

addr them = '';           // no port number
addr theirServer = '';
theirServer.host = '';      // change host name only
theirServer.host = them;                // error: type mismatch!

addr mask1 = '';            // with a subnet
addr mask2 = 'fxp0::'; // more optional details

IPv6 addresses present a slight problem because common usage (e.g., in URLs) is to put a colon (":") between an address and a port number. However, colons are used as delimiters in IPv6 addresses, the same way that dots (".") are used for IPv4. So that PGL can tell the difference between an IPv6 digit and a port number, you must place IPv6 addresses inside square brackets, like this:

addr foo     = '[1234::5:1:2]';
addr server  = '[1234::5:1:2]:80';
addr masked  = '[1234::5:1:2]/120';
addr theworks = 'lo0::[1234::5:1:2]:80/120';

Arrays of addresses can be formed using regular array operations. To create an array with many "similar" addresses, a handy address range notation can be used. The a-b.c-d.e-f.g-h notation instructs PGL to produce an array of IP addresses that belong to a range specification. At least two ranges (or points) must be specified.

addr[] srv_ips = [ '10.100.1-2.1-250:8080' ]; // 500 unique IP addresses
addr[] rbt_ips = [
    '', ''    // 500 IP addresses

Or, for IPv6:

addr[] range   = [ '[1234::5:1-10:1-250]' ];
addr[] space   = [ '[1234::5:1-10:1-250]/120' ];
addr[] servers = [ 'lo0::[1234::5:1-10:1-250]:80/120' ];

Similar rules for forming address ranges apply to FQDNs. Use square brackets to help Polygraph to identify which part of the address must be "iterated".

R1.origins = [ 'www.1-15.company.com' ];  // 15 unique FQDNs
R2.origins = [ 'www.company[1-15].com' ]; // 15 unique FQDNs

While it is unlikely that you would want to do that, you can mix IPv4 addresses, IPv6 addresses, and DNS names in a single array of addresses because all these addresses are of the same addr type. For example:

addr[] foo = [ '', '[1234::1]', 'www.example.com' ];

A dynamic name is an address mask or pattern that generates new static names as the test progresses. Dynamic names are represented by the DynamicName PGL type. DynamicName objects are usually created using the dynamicName function:

DynamicName DN = dynamicName('*.example.com:9090', 10%);

PGL allows DynamicName to be used anywhere the addr type can be used, but the address mask makes sense in Robot origins and AddrMap names contexts only. More information about dynamic domain names is available elsewhere.


Array is simply a list of items of the same type. Polygraph extends arrays dynamically to accommodate all items so no array size specifications are supported. One cannot extract an element from an array (such a capability seems unnecessary because PGL does not support loops).

int[] numbers; // a declaration of an array of integers
time[] alarms = []; // an empty array of time values
addr[] ips = [ '', '' ]; // an array of two addresses

Arrays do automatic interpolation of sub-arrays. That is, when an array A is evaluated, an item I of array type is interpolated into A just as if each individual element of I were a member of A. Thus, arrays lose their identity in an array environment. (This feature and its explanation were borrowed from Perl language).

// the following two arrays are equivalent
int[] A1 = [ 1, 2, 3, 4 ];
int[] A2 = [ 1, [2, [3]], 4];
// A1 becomes a concatenation of A1 and A2:
A1 = [ A1, A2 ];

Arrays that specify probabilities for their members are sometimes called "selectors". Selectors are discussed elsewhere.

Array is not really a stand-alone type, just a notation.


Boolean type can take the following values, with obvious interpretation: true, false, yes, no, on, and off. Simply use whatever value is appropriate for a given situation.

RampPhase.log_stats = yes;

The bwidth type is nothing else but a size/time fraction.

bwidth bw  = 100Mb/sec;      // 100BaseTX (100 Mbit per second)
size sz    = 500Kb;
time tm    = 10sec;
bwidth bw2 = sz/tm;          // 50Kbps, naturally
bwidth bw3 = 13/sec;         // Error: type mismatch

Distr type allows you to specify a random distribution of a well-known shape. In PGL, distributions are "typed". That is, you must specify the type for values along with the shape of the distribution. Polygraph is usually able to guess the values type by examining the parameters of the distribution function.

size_distr repSize = exp(13KB); // exponential distribution of sizes
int_distr connLen = zipf(64);   // Zipf-distributed connection lengths

The following distribution shapes are recognized.

  • Constant: const(mean)
  • Uniform: unif(min, max)
  • Exponential: exp(mean)
  • Normal: norm(mean, std_dev)
  • Lognormal: logn(mean, std_dev)
  • Zipf(1): zipf(world_size)
  • Sequential: seq(max)
  • Arbitrary/tabular: table(filename, type)

When a time distribution is used to specify Object Life Cycle parameters, it can be augmented by special qualifiers. The following qualifiers are supported.

  • now -- current time
  • lmt -- last modification time
  • nmt -- next modification time

The value of the nmt qualifier is what lmt would read after the object is modified once. That is, it is the "next last modified time". This qualifier is handy for specifying truthful Expires header fields.

// object life cycle for "HTML" content
ObjLifeCycle olcHTML = {
    length = logn(7day, 1day);      // heavy tail, weekly updates
    variance = 33%;
    with_lmt = 100%;                // all responses have LMT
    expires = [nmt + const(0sec)];  // everything expires when modified

Arbitrary distributions can be specified using external value:probability tables described elsewhere.


Floating point values are represented using float type. Common arithmetic operations are supported. Integer values are implicitly converted to floating point in a float context. There is no implicit or default conversion from floating point values to integers. Use the int() function call for an explicit cast.

float f = 5/10;   // f is equal to 0.0
float f = 5.0/10; // f is equal to 0.5
int i = f;        // Error: no default conversion from float to int

Internally, Polygraph stores floating point values using "double precision" (usually 8 bytes per variable).


Integer values are represented using int type. Common arithmetic operations are supported for integers. The important thing to remember about integer arithmetic is that all calculations are done with integer precision. For example, 3/2 yields 1 and 3*(2/3) yields zero.

There is no implicit or default conversion from floating point values to integers. Use the int() function call for an explicit cast.

A integer value of zero can be implicitly converted to many other types, resulting in a "none" or "nil" value. Note that the latter is not the same as an "undefined" value. Polygraph may replace undefined values with appropriate defaults, but zero value cannot be silently replaced or ignored.

int i = 5/10;             // OK; i is equal to 0
int i = 5.0/10;           // Error; no default conversion from float
int i = int(10*(5.0/10)); // OK; i is equal to 5
time_distr xactThinkTime = 0; // no delays

List is a coma-separated enumeration of items. List items can be of different types. Lists are used in function and procedure calls, but you should not attempt to declare a list variable.


Rate type is nothing else but a float/time fraction.

rate req_rate  = 10.1/sec;     // about 10 requests per second
rate xact_rate = 3/5min;       // 3 xactions in 5 min interval
rate rep_rate  = 0;            // no replies at all
float dummy = xact_rate * sec; // that many xactions each second
rate r = 13/5;                 // Error: type mismatch

Selector is an array with probabilities associated with every item. By default, all probabilities are unknown. When actual probabilities are needed, the items with unknown probabilities will absorb whatever is left from 100%, in a fair fashion.

addr[] servers = [ 
    '' : 30%, // this server will be used in 30% of cases
    '' : 50%, // this server will be used in 50% of cases
    ''         // 100-30-50 = 20% is everything that is left
                         // for the last server

If probabilities add up to less than 100%, they are adjusted proportionally to their absolute values.

// the following two selectors are equivalent:
Phase[] scheduleA = [ ph1 : 20%, ph2 : 60% ];
Phase[] scheduleB = [ ph1 : 25%, ph2 : 75% ];

Note that Polygraph does not complain if you specify probabilities in an array where none are expected. Such probabilities are silently ignored.

Selector is not really a stand-alone type, just a notation.


For size constants, Polygraph understands the following scales:


Scale suffices can be shortened to the first two letters (e.g. 5KB) except for the Bytes suffix that cannot be shortened.

Scale suffix can be applied to integer and floating point numbers. In case of floating point numbers, the final number of bytes is rounded to the smallest closest integer.

size s0 = 3KB + 1Mb;
size s1 = 2.5Bytes; // OK; truncated to 2 bytes
size s2 = 10 * s1;  // s2 holds 20 bytes
size s3 = s0/s1;    // Error: type mismatch

PGL can handle sizes up to 4611686016279904256 bytes on machines with 4 byte integers, which is approximately 4 exabytes. However, Polygraph objects cannot handle sizes larger than 2GB unless noted otherwise.


Socket objects can be used to specify socket(2) options for HTTP connections. Polygraph defaults should do just fine though.


String constants are specified using "double quoted" strings. At the time of writing, no interesting operations on strings were supported.


For time constants, Polygraph understands the following scales:

msecmsmillisecond (1/1000 second)
min minute
hourhr60 minutes
day 24 hours
year 365 days

Scale suffix can be applied to integer and floating point numbers. In case of floating point numbers, the closest approximation is chosen to represent integer seconds and milliseconds.

time t0 = 5min + 1sec;
time t1 = 0.5sec;               // OK; 500 milliseconds
time t3 = t0/t1;               // Error: type mismatch

PGL also allows for "absolute time" constants. Absolute constants are specified using single quoted strings and come in one of the two formats: 'YYYY/MM/DD' or 'YYYY/MM/DD HH:MM:SS'.

time today = '1999/08/23 13:10:30'; // absolute date

Absolute times are assumed to represent Universal Coordinated Time (UTC).


Identifiers are used by Polygraph agents to distinguish the URLs and content they generate.

Polygraph generates unique identifiers internally. At the time of writing, one cannot specify an arbitrary unique identifier; the only way to get an object of the uniq_id type is to call the uniqId() function.

Home · Search · Print · Help