|
Home · Search · Print · Help
Types
PGL supports many generic and domain-specific types.
AddrMap, AddrScheme, Agent, Bench, BenchSide, Cache, Content, DnsResolver, DutState, Goal, Mime, ObjLifeCycle, Phase, PolyMix3As, PolyMix4As, PopDistr, PopModel, Proxy, Robot, Range, SingleRange, MultiRange, Rptmstat, Server, Session, StatSample, TmSzStatSample, HrStatSample, AggrStatSample, LevelStatSample, StatsSample, WebAxe4As, addr, array, bool, bwidth, distr, float, int, list, rate, selector, size, Socket, string, time, uniq_id
Detailed descriptions for supported types are given below. Most types are
"structures" containing several fields. PGL has no facility to declare new
types.
AddrMap objects provide mapping of network addresses
(domain names or IPs) to IP addresses. The former are usually the
addresses that origin servers are visible as (e.g., a VIP address of a
L4 switch doing origin server load balancing). The latter are usually
IP addresses of simulated server agents.
AddrMap map1 = {
zone = "hosting.com";
addresses = '192.168.0.1-10:8080';
names = 'host[1-10].hosting.com:8080';
};
...
use(map1);
The zone field is not used by Polygraph run-time code, but
can be used by external programs such as dns_cfg to build
zone files based on a PGL configuration file.
Many non-overlapping maps can be use()d in one experiment.
The names field may contain IP addresses. The
addresses field must contain IP addresses only.
AddrMap vip1 = {
addresses = '192.168.1.1-10:8080';
names = '10.0.0.1:80';
};
AddrMap vip2 = {
addresses = '192.168.2.1-10:8080';
names = '10.0.0.2:80';
};
...
use(vip1, vip2);
Currently, only 1:1 and 1:N mappings are supported. An unmapped
name maps to itself by default (it has to be an IP address in that
case, of course).
Needless to say, your DNS server should be able to resolve the
names used in your PGL file.
AddrScheme is a base type for various algorithms that are
able to compute agent addresses based on the workload type and bench
configuration. There is at least one *As addressing scheme
type per each workload that supports automatic address calculation
(e.g., PolyMix4As type for PolyMix-4 workload).
The kind field is used as a label to distinguish
addressing schemes when the exact scheme type is unknown.
Agent is a base type for PGL robots, servers, and proxies.
In other words, agents have properties common to those three
types. Usually, you will not use the agent type directly, but
knowing its properties helps in robot and server
manipulation.
The kind field is a label used for information
purposes only.
The xact_think field determines "transaction think
time". Servers "think" after accepting a connection and before
reading request headers. Client-side "think time" is not supported
in favor of request rate or request interarrival time settings.
The http_versions selector determines
agent's HTTP version. Two versions are supported: "1.0" and "1.1".
The latter is the default. The selection is sticky for the lifetime
of an agent. The version affects protocol version in request-lines of
HTTP requests generated by Polygraph robots and status-lines of HTTP
responses generated by Polygraph servers. This knob has no effect on
other defaults. For example, you still need to explicitly enable
persistent connections, even if you are using HTTP/1.1 agents.
This knob is available starting with Polygraph version 2.8.0.
An HTTP connection will never have more than pconn_use_lmt
requests. Persistent connections are disabled by default. To
explicitly disable persistent connections set use limit to
const(1). To have virtually no limit on the number of
requests per connection, set use limit to const(2147483647).
Note that a connection may be closed for reasons other than
pconn_use_lmt.
The idle_pconn_tout field specifies the delay
after which an idle persistent connection (i.e., a
connection with no pending messages) will be closed.
The abort_prob field specifies the probability that an
HTTP transaction will be aborted. To abort a transaction, an agent
closes the corresponding HTTP connection. At the time of writing,
aborts are supported when handling HTTP message bodies only. If the
transaction is to be aborted, the agent selects abort offset using a
uniform distribution. The transaction is then aborted when at least
offset bytes of the message body are read and/or written (from the
application point of view). Aborts are not considered errors on
aborting side but are likely to look like ones for the agent on the
other side of the transaction.
The addresses field tells Polygraph what IP
addresses the agent should bind itself to. Essentially, the
agent will duplicate itself to have one self-sustained clone
per IP address. An address may be repeated to start several
agents (agent clones) bound to the same address.
Pop_model affects various URL selection algorithms. For
example, Polygraph robots use this model to select an old URL that
should be repeated (to produce a hit). Servers use the model to select
old URLs to put in the Location: field of redirection
responses (e.g., "302 Found").
The socket field specifies TCP/IP socket options
for TCP sockets used by the agent.
The world identifier is used to mark agent-specific
URLs or content. Manually setting this field may help to
reproduce the exact conditions of past experiments, but there are
better ways to do that.
Cookie_sender probability determines the chances that a
given Polygraph agent sends cookies. The selection of a cookie-sending
status is done at agent start time and is sticky (does not change).
HTTP servers send cookies using the Set-Cookie header. HTTP
clients (Polygraph robots) send cookies using the Cookie
header. Both servers and robots have parameters that further affect
cookie handling, but the cookie-sending status is always checked
first. If cookie sending probability is zero (default), no agents
within the given configuration group will send cookies. If cookies
sending probability is 50% then roughly half of the agents
will be sending cookies (agent-specific parameters permitting). Cookie
sending functionality has been added to Polygraph version 3.0.
Proxy agents currently ignore all but the addresses
field of their parent type.
The Bench maintains information about the benchmarking
environment (e.g., the number of physical hosts available for the test)
and test parameters such as peak request rate. Information is maintained
on a per-side basis.
As any other PGL object, an object of type Bench must appear
(directly or indirectly) as an argument of a PGL function or procedure
call to be of any affect.
BenchSide maintains configuration information about
client, server, or proxy side of the bench.
The max_host_load field specifies the maximum load
(requests/responses per unit of time) that a physical host should
generate/sustain. Given peak_req_rate of the bench, this field
determines the number of hosts required for the simulation (on one
"side" of the bench).
The max_agent_load field specifies the maximum load
(requests/responses per unit of time) that a simulated agent should
generate/sustain. Given max_host_load, this field determines the
maximum number of agents per host on one "side" of the bench. The actual
number of agents depends on the peak_req_rate.
Addr_space defines an array of addresses for various address
allocation schemes to pick agent addresses from. For example, a PolyMix-4
addressing scheme may pick the first 500 addresses from the provided space
to assign to agents on the first test box. The space addresses often
include interface names and subnet information to assist Polygraph in
creation of the corresponding IP aliases.
Addr_mask is used by various old address allocation schemes to
generate agent addresses. Only the first two octets (aka "network
number") of the mask are honored. Use addr_space instead if
possible.
The Cache type is used to configure a proxy cache.
The capacity field specifies the maximum size of the
cache. When the sum of content lengths of all cached objects exceeds
the configured capacity, some objects may be purged to free space for
the incoming traffic. Setting capacity to zero effectively disables
the cache.
When set, icp_port instructs the cache object to listen
for ICP queries on the specified port and reply to those queries
according to the cache contents. At the time of writing, misses are
replied with the miss-no-fetch ICP opcode.
Cache admission policy admits every cachable object at most
capacity in size. The replacement policy is LRU.
Polygraph allocates about 80 bytes of housekeeping
information per cache entry and assumes that average object size is
10KB. It is a good idea to make sure that your benchmarking
environment has more than enough memory for the configured cache
capacity.
Polygraph cache does not store object content, of course. If
needed, "cached" content can be generated from scratch, using the
corresponding origin server configuration. This content regeneration
is the responsibility of proxy's server side. If you are using the
cache, make sure that the origin servers in the PGL proxy
configuration file are exactly the same as the origin servers used in
the experiment!
The Content type accumulates details about such Web object
properties as MIME type, size, cachability, etc.
The checksum field specifies probability that an entity
will have an MD5 checksum computed and attached to the response using
HTTP Content-MD5 header field. For all HTTP responses with Content-MD5
headers, Robots calculate an MD5 checksum from scratch and compare it
with the value in the header. Mismatches are reported as errors.
Since MD5 computation is CPU-intensive, setting the checksum
field to high values may slow down server and client processes.
Please note that standard MD5 algorithm (no secret salt) is used and
that Robots trust the received Content-MD5 headers. Thus, an
intermediary can attach its own header to cause verification on the
client side or can alter the content and the header to avoid checksum
mismatch errors. Using checksum may be useful when a proxy
is suspected of accidently (unknowingly) altering the content.
The recurrence field is ignored. Use
bhr_discrimination setting of the popularity model
instead.
The may_contain field specifies embedded types that the
content type may contain. For example, HTML objects may contain
various images and audio files.
The embedded_obj_cnt distribution is used to determine the
number of embedded objects in the container of the corresponding
content type.
Several content options deal with simulating realistic content
using Polygraph's CSM model. The content_db field specifies
the filename of the content database (a file produced with the
cdb tool). Inject_db holds the name of the file
where the strings to be injected into the generated content are
stored. Individual injections appear approximately inject_gap
apart if possible. Infect_prob specifies probability that a
generated object will be infected (i.e., will contain at least one
injection).
The encodings strings specify supported content codings
and are used for enabling content compression features.
When a Polygraph agent has to resolve a domain name, it contacts DNS
servers based on the DnsResolver information.
The servers field contains DNS servers to contact.
The timeout field specifies the maximum delay after which a
still unacknowledged DNS query is considered failed.
The DutState objects are used as a part of conditional calls
in the Watchdog feature. The latter is described elsewhere.
The rptm_min and rptm_max fields contain minimum
and maximum levels for measured mean response time.
Fill_size_min and fill_size_max fields contain
minimum and maximum levels for cumulative fill size (volume).
Xactions_min and xactions_max fields contain minimum
and maximum levels for cumulative transaction counts.
Rep_rate_min and rep_rate_max fields contain minimum
and maximum levels for averaged measured response rate.
Errors_min and errors_max fields contain minimum
and maximum levels for cumulative number of errors.
Error_ratio_min and error_ratio_max fields contain
minimum and maximum levels for average error ratio.
Dhr_min and dhr_max fields contain minimum
and maximum levels for average document hit ratio.
Goal specifies one or more simulation goals for a given
phase. Individual sub-goals are ORed together. That is,
reaching one sub-goal is enough to reach the entire goal.
All sub-goals except errors are called "positive"
sub-goals. Specifying errors or a "negative" sub-goal is
somewhat tricky. If errors value is less than 1.0 than
it is treated as error ratio. Otherwise, it is treated as
error count. For example, a value of 0.03 would mean
that getting at least 3% of errors is enough to reach the goal,
while the value of 3 would mean that at least 3 errors
are enough.
Mime type groups together Web object properties related to
MIME standard. Properties related to URL path generation are also
encapsulated in the Mime type, but that is likely to change.
The type field specifies the string to be used for the
Content-Type: HTTP header.
Strings from the prefixes array are appended (with a specified
probability) to the address part of the URL, before the start of
Polygraph-specific URL path. The prefix string is always prepended with a
slash character. However, no special delimiter is used between the prefix
and URL path; a delimiter (if any) must be a part of the prefix string
(e.g., "images/").
Strings from the extensions array are appended (with a
specified probability) to the Polygraph-specific URL path. No special
delimiter is used to append an extension; a delimiter (if any) must be a
part of the extension string (e.g., ".html").
ObjLifeCycle specifies the parameters for the Object Life Cycle model.
Here is a sample configuration.
ObjLifeCycle olc = {
length = logn(7day, 1day); // heavy tail, weekly updates
variance = 33%; // highly unpredictable updates
with_lmt = 100%; // all responses have LMT
expires = [nmt + const(0sec)]; // everything expires when modified
};
See the distribution
type for a list of supported qualifiers for time distributions
(lmt, now, nmt, etc.).
The birthday field is ignored in recent Polygraph
versions.
Most Polygraph measurements are collected on a phase
basis. Phases also allow to vary the overall load and other "global"
characteristics to model complex workload patterns.
Phase name is used for informational purposes only. Do not
use name "All" which is an lx macro that stands for "all
phases". Also, if you are going to make graphs based on console
output (rather than binary logs), you want to avoid phase names with
whitespaces. The latter will effectively change the number of columns
in console stats lines and confuse plotting tools.
Phase goal specifies the duration of the phase and/or
other phase termination conditions.
Populus factors affect the number of robots alive. Population size
can be varied from 0% to 100%, relative to the total
number of individual robots configured for the test. The latter is
determined as the total number of addresses of all use()d robots.
Note that a live robot can be idle or busy, depending on its session
configuration and state. Polygraph can vary population size starting
with version 2.7.0.
Load factors affect the load generated by Polygraph robots. Load
level can be varied from 0% to 100% and beyond, relative
to the load generated by an individual robot. In other words, load
factor tells each robot to adjust its activity accordingly. Varying
robot population size is preferred to varying robot load levels as it
produces more realistic workloads.
Other factors behave in a similar fashion. Recur_factor is
applied to the recurrence_ratio of a Robot.
Special_req_factor is applied to the portion of "special
requests" such as "IMS" or "Reload". The latter can be specified
using the "req_type" field of a robot.
If factor_beg is not equal to factor_end, then
the current factor is adjusted linearly during the phase. That is, the
factor is increased(decreased) from factor_beg to
factor_end. Such adjustments require a positive phase
goal.
There are a couple of simple "factor preservation" rules that
make load factors easy to specify. All these rules apply only when a
factor is not explicitly defined.
- For undefined factor_beg, use factor_end of
the previous phase.
- For undefined factor_end, use factor_beg of
the current phase.
- If a factor is still undefined, it is set to 100%.
These rules eliminate repetitions of factor entries for consecutive
phases. Only changes in load levels have to be specified.
Finally, the log_stats flag tells Polygraph if statistics
collected during the phase should be recorded in a log file. This flag
defaults to true.
PolyMix3As type represents addressing scheme for PolyMix-3
workload.
PolyMix4As type represents addressing scheme for PolyMix-4
workload.
The PopDistr type is similar to the distribution type. Popularity
distribution specifies how to select the next object to be requested
from a group of objects that were requested before. In other words, it
specifies which objects are more popular than others (i.e., requested
more often) within a certain group of objects.
PopModel R;
R.pop_distr = popZipf(0.6);
The following popularity distributions are supported.
- popUnif() -- Uniform: all objects have equal chance of being selected
- popZipf(skew_factor) -- Zipf: zipf-like power law with the specified skew
Popularity model specifies how to select the next object to be
requested among all objects that were requested before. In other
words, it specifies which objects are more popular than others (i.e.,
requested more often).
The selection of the object to be requested is done in three
stages. First, Polygraph determines whether the object should come
from a "hot set". That decision is positive with a probability
specified by the hot_set_prob field.
During the second step, the popularity distribution specified by
the pop_distr field is used to select a particular object. If
the object is selected among "hot" objects, the selection is limited
by the hot set size. Otherwise, the entire working set is used. The
hot set size is a fraction of the current working set size specified
by the hot_set_frac field.
Finally, a byte hit ratio (BHR) discrimination algorithm is applied
with bhr_discrimination probability. The algorithm selects
the object with the smallest size among at most nine objects centered
around the selection made at the second stage. Uncachable objects are
ignored during the selection. Moreover, the algorithm does nothing
when the second stage selects an uncachable object. Thus, configured
content type cachability ratio is not affected, and uncachable objects
should have the same recurrence ratio regardless of their size.
Without the discrimination algorithm, offered BHR would be about the
same as offered document hit ratio (DHR) while real BHR is usually
some 30%-40% lower than DHR. The BHR discrimination algorithm was
introduced in version 2.7.2 of Polygraph.
PopModel popModel = {
pop_distr = popUnif();
hot_set_frac = 1%; // hot set is 1/100th of the working set size
hot_set_prob = 10%; // every 10th object is requested from the hot set
bhr_discrimination = 90%; // revisit smaller files more often
};
Robot R;
R.pop_model = popModel;
Proxy agent simulates a proxy cache. The client side (i.e., the
side that sends requests to and receives replies from the servers) is
configured using a Robot agent. Similarly, the server side (i.e., the
side that receives requests from and sends replies to clients) is
configured using a Server agent. A proxy may also have a cache to
store some of the proxied traffic.
The client side attempts to cache every cachable object it fetches.
The server side attempts to resolve every request from the cache. See
the Cache type description for important caveats of using the
cache.
There is no direct connection between ICP ports of the client side
and the cache (Robot and Cache types for the descriptions of those
fields). However, in most cases, these two ports should be set to the
same value because a real proxy usually sends and receives ICP queries
using the same UDP port.
Note that the addresses field of the proxy agent overwrites
the addresses fields of client and server configurations. Other
fields inherited from the Agent type are currently ignored. The latter
is a bug.
Proxies are activated by the polypxy program.
Derived from the Agent type, robot (a.k.a. "user" or
"client") is the main logical thread of execution in
polyclt. Robots submit requests and receive replies. The
frequency and nature of the submissions depend on the workload.
The origins field lists names or addresses of origin
servers to be contacted.
The proxies field lists names or addresses of HTTP proxies
to send the requests through. A robot selects a random proxy at the
configuration time and uses that proxy for the entire duration of the
test (sticky proxy assignment). Proxy addresses are distributed evenly
(if possible) among all robots in the test. Individual groups of
robots (e.g., all robots on one host) may not get an even
distribution. The proxies field is mutually exclusive with
the proxy command
line option.
When req_rate is specified, a robot will emit a Poisson
request stream with the specified mean rate, subject to phase load
levels. The req_inter_arrival field can be used to specify
request arrival stream different from Poisson. Naturally, the two
fields are exclusive.
If neither of req_rate or req_inter_arrival are
set, a Robot will use the "best effort" approach, submitting next
request immediately after a reply to the previous request has been
received.
Recurrence ratio is simply how often a robot should
re-visit a URL. In other words, how often a robot should request an
object that was accessed before (possibly by other robots). Note that
recurrence ratio is usually higher than hit ratio because
many objects are uncachable and repetitive requests to uncachable
objects do not result in a hit.
The embed_recur field specifies the probability of
requesting an embedded object when the reference to the latter is
found in the response.
Public_interest ratio specifies how often a robot would
request a URL that is "known" to (and can be requested by) other
robots. Robots are usually independent from each other in their
actions. However, they may access same objects on the same servers. If
public_interest is zero, a robot would request only
"private" objects from all origin servers, resulting in no overlap
of URL sets requested by individual robots. Note that both public and
private objects can be requested more than once and hence produce a
hit. This field has been removed starting with Polygraph version 2.8.0
in favor of a more general interests field documented
below.
Interests selector configures
Robot interest in URL worlds. Three kinds of worlds are supported:
private, public, and foreign. These kinds can be mixed freely, but
non-foreign interest is required for phase synchronization to
work. Public worlds interest specifies how often a robot would request
a Polygraph-generated URL that is "known" to (and can be requested
by) other robots. Robots are usually independent from each other in
their actions. However, they may access same objects on the same
servers. If private interest is 100% (which is the default), a robot
would request only "private" objects from all origin servers,
resulting in no overlap of generated URL sets requested by individual
robots. Finally, foreign interest specifies the portion of URLs that
should come from Robot's foreign_trace. Note that public,
private, and foreign objects can be requested more than once and hence
produce a hit. This field replaced less general
public_interest field starting with Polygraph version
2.8.0.
Robot R = {
...
// public_interest = 75%;
interests = [ "foreign": 1%, "public": 74%, "private" ];
foreign_trace = "/usr/local/traces/special_sites.urls";
};
The req_types array specifies what kind of requests the
robot should emit and with what probability. Several request types are
supported: "Basic" (a common GET request), "IMS" (a
request with an If-Modified-Since header field),
"Reload" (a request with a Pragma: no-cache and
Cache-Control: no-cache header fields), and
"Range" (a request with a Range header field).
The req_methods array specifies HTTP request methods the
robot should use and with what probability. Several methods are
supported: "GET" (default), "HEAD", "POST", and "PUT".
Request methods are somewhat orthogonal to request types. For example,
an IMS request may be issued using HEAD request method. Polygraph
may not support all combinations though.
The private_cache_cap field specifies the size of the
robot cache. Robots do not cache object content, but remember URLs and
other object characteristics. For example, when IMS request is
generated, the IMS timestamp is taken from the robot cache if
possible.
Pop_model specifies which "popularity model" to use when
requesting an object that has already been requested before. You must
specify popularity model if you specify positive recurrence
ratio.
When unique_urls flag is set, each request submitted by
polygraph will be for a different URL. Note that this option is
applied last and changes a URL without affecting the object id part.
Object IDs are responsible for generating various object properties.
Thus, for filling-the-cache experiments, it may be a good idea to use
this option (in conjunction with other options like
recurrence and public_interest) to generate objects
similar to production tests (but with zero hit ratio).
The pipeline_depth distribution determines the maximum
number of concurrent outstanding requests on a persistent connection.
Request is considered outstanding until the corresponding response is
completely received. By default, requests are not pipelined, as if
const(1) value was specified for the pipeline depth. Pipeline
depth knob has no effect on connection persistency and actual depth
depends on factors such as connection persistency and presence of
embedded objects. See traffic model for more details
about request pipelining. Pipelining is supported in Polygraph
starting with version 3.0.
Open_conn_lmt is the maximum number of open connections
(in any state, to any server) a robot may have at any given time. A
robot will postpone (queue) new transactions if the limit is reached.
This limit simulates typical behavior of browsers like Netscape
Communicator that have a hard limit on the total number of open
connections. See Pei Cao's experimental study
for more information.
Wait_xact_lmt is only useful when open_conn_lmt
is specified. If the robot reaches its open connections limit, it will
queue the extra transactions. When the queue length grows beyond
wait_xact_lmt, new transactions will be simply ignored (with
an appropriate error message).
Minimize_new_conn is the probability that a robot will
treat connections to substitute addresses as connections to the
same agent (and, hence, reuse them if needed). This is useful for
running various no-proxy or no-VIP tests while keeping the number of
persistent connections similar to a "proxied" environment.
The session field is useful for simulation of the
login/out behavior of many Web clients, including browsing humans. See
Session type for
more information.
User_names do not affect robot behavior but may be useful
for testing external accounting and authentication services. Each name
is just a string. A robot picks a new name at the start of the
session. Within one robot configuration, no two sessions share a name,
provided all configured names are unique and there are enough names
(i.e., the number of user names is at least the number of robot
addresses). Names are selected in random order, with equal
probability.
The peer_icp address enables ICP module of the robot; the
robot will send ICP queries for all to-be-requested objects from the
icp_port to that address. The peer_http address
specifies where to send HTTP queries after an ICP peer returns a
hit.
Note that if only peer_icp address is set, the robot will
send ICP queries to the specified address, but will not fetch objects
from a peer. Setting peer_http only may not be supported,
use the "--proxy" option instead. At most one ICP and at most one
HTTP peer can be configured. Using completely different addresses for
the two peers is allowed, but usually does not make sense.
The dns_resolver field specifies the DNS resolver for a
robot to use.
The foreign_trace specifies the name of a file that
contains absolute HTTP URLs to request when foreign interest is
selected according to the interests field. The trace file
must have one URL per line. HTML anchors (or #-comments) are stripped.
Whitespace at the beginning and at the end of a line is stripped.
Empty lines are ignored. All URLs are pre-loaded at the start of a
test. Thus, larger traces will require more RAM. Misses are generated
in trace order. Once all URLs in a trace are requested, the iteration
start from the top of a trace. The trace order has no influence on hit
generation. However, Polygraph assumes and does not check for URL
uniqueness, and duplicate trace entries may cause unexpected (for
Polygraph) hits.
Cookies_keep_lmt distribution determines the maximum
number of origin server cookies that a robot can remember and keep.
When the number of incoming cookies exceeds the specified limit, the
Robot removes old cookies in a FIFO order. By default, 4 cookies will
be kept for each server. A robot will send back all cookies it
remembers, if any (provided the robot is a cookie-sending agent, of course). To
save RAM, all robots within a polyclt process share cookie
storage and the FIFO queue. This may change if servers generate more
sophisticated cookies. Cookie sending functionality has been added to
Polygraph version 3.0.
The accept_content_encodings strings specify content
codings to be listed in an HTTP Accept-Encoding request
header. This knob is used to trigger content compression at the
server.
The spnego_auth_ratio controls the choice of the algorithm
for NTLM or Negotiate authentication. If unset or set to
zero, NTLMSSP algorithm will be used. Otherwise, the corresponding
portion of authentications will be done using SPNEGO (a.k.a., GSSAPI)
algorithm.
The ranges selector specifies what ranges the robot should
use when generating a "Range" request and with what probability.
Please see the Ranges manual
for more information.
The req_body_pause_prob parameter specifies
the probability of a paused request. A paused request is
the request with an "Expect: 100-continue" header. After
sending a paused request, the robot waits for an HTTP
100 "Continue" control message from the server or the
final HTTP 417 "Expectation Failed" response. The
default is not to pause requests. This option is
mutually exclusive with the
req_body_pause_start option described below.
Please see the Request Bodies manual page for more
information.
The req_body_pause_start parameter specifies
the minimum size of a paused request (see
req_body_pause_prob above for terminology and
implications). Requests with bodies smaller than the
specified size are not paused. The default is not to
pause any requests. This option is mutually exclusive
with the req_body_pause_prob option described
above. Please see the Request Bodies manual page for more
information.
Range is a base type for PGL SingleRange and
MultiRange
types. You should not use the Range type directly.
| SingleRange |
| | size | first_byte_pos_absolute |
| | float | first_byte_pos_relative |
| | size | last_byte_pos_absolute |
| | float | last_byte_pos_relative |
| | size | suffix_length_absolute |
| | float | suffix_length_relative |
The SingleRange type is used to configure a single range
request. For more information, please see the ranges manual.
The first_byte_pos_absolute and
first_byte_relative fields are absolute (in bytes) and
relative (in percentage of whole entity size) positions of the first
range byte.
The last_byte_absolute and last_byte_relative
fields are absolute (in bytes) and relative (in percentage of whole
entity size) positions of the last range byte.
The suffix_length_absolute and
suffix_length_relative fields are absolute (in bytes) and
relative (in percentage of whole entity size) sizes of the requested
range suffix.
The *_absolute fields are mutually exclusive with the
*_relative fields. The byte-fields are mutually exclusive with
the suffix-fields, just like in the RFC 2616 BNF.
The MultiRange type is used to configure a request with a
multi-spec Range header. For more information, please see the ranges
manual.
The first_range_start_absolute and
first_range_start_relative fields are distributions of
absolute (in bytes) and relative (in percentage of whole entity
size) positions of the first byte of the first range spec. These
fields are optional.
The range_length_absolute and
range_length_relative fields are distributions of absolute
(in bytes) and relative (in percentage of whole entity) sizes of an
individual range spec.
The range_count distribution is used to determine the
number of individual range specs.
The *_absolute fields are mutually exclusive with the
*_relative fields.
Rptm-stat is to response time what thermo-stat is to temperature in the
room. Rptmstat specifies an "acceptable" response time range
(from rptm_min to rptm_max) and the factor change
percentage that should be applied to the current load factor if mean
response time in a sample is outside of the given range.
For "flat" phases (i.e., phases with load_factor_beg equal
to load_factor_end), the current load factor will be increased or
decreased by load_delta percentage depending whether response
time is lower or higher than acceptable.
For phases with variable configured load factor, the slope of the
factor curve will be increased or decreased by load_delta.
However, current load factor will never become lower than
load_factor_beg or exceed load_factor_end!
The sample_dur field sets the sample duration or "size".
Samples follow each other without overlaps.
Derived from the Agent type, server is the main logical
thread of execution in polysrv that models an HTTP origin
server. Servers receive requests and send replies. The speed and
nature of the replies depend on the workload.
Accept_lmt specifies the limit for consecutive attempts to
accept(2) an incoming connection. The attempts are terminated
with the first un-successful accept(2) system call or when the limit
is reached. By default, there is not limit.
Contents field is a content selector. It specifies the
distribution (or relative popularity) of content types for the server.
Each content type must be "accessible". That is, each type must be
in the closure of the direct_access selector described
below.
Direct_access array specifies what content types can be
accessed directly (i.e., not as an embedded object) by a robot. The
configuration below describes a simplified relationship among the
three most popular content types.
#include "contents.pg"
Server S = {
contents = [ cntImage : 70%, cntHTML : 10%, cntOther ];
direct_access = [ cntHTML : 95%, cntOther ];
};
The rep_types array specifies what kind of replies the
server should emit and with what probability. Two reply types can be
specified: "Basic" and "302 Found". "Basic" corresponds to "200 OK" or
"304 Not Modified", as appropriate depending on the actual request.
The cookie_set_prob probability determines the portion of
HTTP responses for which the server will attempt to generate cookies
(provided the server is a cookie-sending agent, of course). If
cookies need to be generated, the cookie_set_count
distribution is used to determine the number of cookies in the
response, and the cookie_value_size distribution is used to
determine the sizes of individual cookie values. Each cookie gets its
own Set-Cookie header field. Cookie values are random quoted
strings with sessN cookie names. Cookies do not expire and do
not have explicit paths. Polygraph robots may return
cookies depending on client-side cookie-related options. Cookie
sending functionality has been added to Polygraph version 3.0.
The req_body_allowed parameter specifies the
probability that the server "allows" a "paused" request
by responding with an HTTP 100 "Continue" control
message to a request with an Expect: 100-continue
header. The default is 100% (i.e., allow all paused
requests). Please see the Request Bodies manual page and the Robot
req_body_pause_prob field for more
information.
Session objects are used to configure robot behavior. A
single session consists of two periods: busy and idle. During the busy
period, a robot behaves normally, as if no sessions were configured.
At the start of an idle period, a robot clears all request queues.
Robot does not emit new requests during the idle period, but may
finish some outstanding transactions.
Robot R = {
...
session.busy_period.duration = 7sec;
session.idle_period_duration = exp(3sec);
session.heartbeat_notif_rate = 1/2sec;
};
In the example above, the durations of busy and idle periods are
set to 7 seconds (constant) and 3sec (exponentially distributed; new
value is selected when a session starts). Thus, the total session
duration would be 10 seconds, on average.
Busy_period is of type Goal so that you can specify busy
period duration based on, say, the number of transactions and not just
time. Idle period duration is of type "time distribution". One cannot
use distributions with Goal members, but let us know if you need this
feature.
A non-idle session can be configured to emit "heartbeat"
notification events at a specified rate. The above example will emit
one heartbeat event every 2 seconds. These events have no effect on
robot behavior, but are useful for forwarding session events to
external remote programs via Polygraph Doorman feature.
Heartbeat_notif_rate field was named
heartbit_notif_rate in Polygraph version 2.7.0.
StatSample objects are useful in the context of Polygraph Watchdog feature. Each object
provides read-only access to performance measurements collected during a
watchdog sampling period or a phase. Dozens of measurements are
available.
Most StatSample structure members are structures themselves.
See their corresponding types linked above for detales on individual
members. Paragraphs below define top-level member meaning only.
Req.rate is the offered request rate.
Rep.rate is the measured response rate.
Rep is statistics collected for all kinds of HTTP
transactions.
Basic is statistics collected for basic HTTP transactions. A
basic HTTP transaction is a transaction for which the definition or
meaning of a hit is relatively obvious. This excludes
transactions with the following characteristics: non-GET request methods,
If-Modified-Since request headers, response status codes other than 200 or
304, reloads, and aborted I/Os.
Offered is hit/miss statistics for offered hits and misses. An HTTP
request "offers" a hit if an ideal cache would most likely return a cached
copy in response. Only basic transactions are used for this
statistics.
Real is hit/miss statistics for real (i.e., actual or
measured) hits and misses. These stats are based on a client-side guess
when a proxy did not contact a server to produce a response. A guess may
be inaccurate when the proxy contacts the server but uses the old response
headers instead of forwarding the new ones. Only basic transactions are
used for this statistics.
Cachable is cachability statistics for basic transactions.
Fill is statistics for cachable real misses.
Redired_req is statistics for HTTP transactions involving
redirected responses such as 302 (Found). Such transactions are not basic
transactions.
Rep_to_redir is statistics for transactions caused by
earlier redirected responses.
Ims is statistics for transactions involving an HTTP request
with an If-Modified-Since request header. Such transactions are not basic
transactions.
Reload is statistics for transaction involving client "reload"
requests (HTTP requests with Cache-control: no-cache directive). Such
transactions are not basic transactions.
Head is statistics for transactions involving a HEAD request.
Such transactions are not basic transactions.
Post is statistics for transactions involving a POST request.
Such transactions are not basic transactions.
Put is statistics for transactions involving a PUT request.
Such transactions are not basic transactions.
Abort is statistics for HTTP transactions where either request
or response was intentionally aborted prematurely, due to positive
abort_prob setting of an Agent. Such transactions
are not basic transactions.
Xact is concurrency level statistics for all HTTP transactions.
Populus is concurrency level statistics for robots.
Wait is concurrency level statistics for HTTP requests waiting
(for available connection slot) to be submitted. See
open_conn_lmt setting of a Robot.
Conn.open is concurrency level for open HTTP/TCP connections.
A connection is considered "open" from right after the corresponding
connect(2) or accept(2) system call and until the close(2) system
call.
Conn.estb is concurrency level for established HTTP/TCP
connections. A connection is considered "established" if it is open and
was marked as "ready for I/O" by an operating system. This usually means
that the TCP handshake has succeeded for the connection.
Conn.ttl is time-to-live statistics for open connection. That
is, it is the measure of how long connections stay open.
Conn.use statistics counts the number of HTTP transactions per
connection. If persistent connections are disabled, all connections will
have just one "use" count.
Ok_xact.count is the number of successful transactions.
Err_xact.ratio is the ratio of failed to successful transactions.
Err_xact.count is the number of failed transactions.
Retr_xact.count is the number of retried transactions.
Transactions are retried if the request is aborted due to a race conflict
with persistent HTTP connections.
Duration is the time it took to collect the sample, from the
first collected datapoint to the last.
Warning: Do not confuse StatSample with StatsSample. The
latter is likely to be removed from PGL.
TmSzStatSample objects encapsulate response time (rptm)- and
size-based statistics for a given measurement. They can only be used as a
part of a StatSample object.
HrStatSample objects encapsulate "hit" ratio statistics for a
given measurement. They can only be used as a part of a StatSample object. Note that "hit" and "miss" terms
may be changed to names of some other disjoint classes, depending on
the measurement. For example, "yes" and "no" is used for cachability
statistics.
Ratio.obj is a count-based ratio for a given transaction or
content class. For example, actual document hit ratio (DHR) is
real.ratio.obj
Ratio.byte is a volume-based ratio for a given transaction or
content class. For example, actual byte hit ratio (BHR) is
real.ratio.byte
Hit is statistics for transactions that were classified as
those matching the HrStatSample criteria. For example, hit transactions
for the real hit ratio statistics.
Miss is statistics for transactions that were classified as
those not matching the HrStatSample criteria. For
example, miss transactions for the real hit ratio statistics.
| AggrStatSample |
| | int | count |
| | sometype | mean |
| | sometype | min |
| | sometype | max |
| | float | std_dev |
| | float | rel_dev |
| | sometype | sum |
AggrStatSample objects contains aggregate statistics for a
given measurement. They can only be used as a part of a StatSample object.
Count is the number of measurements taken.
Mean is the arithmetic mean of all measurements taken
(i.e., sum/count).
Min is the value of the smallest measurement taken.
Max is the value of the largest measurement taken.
Std_dev is the standard deviation of all measurements taken.
Rel_dev is the relative deviation of all measurements taken (i.e.,
std_dev/mean).
Sum is the sum of all measurements taken.
LevelStatSample objects contains level statistics for a
given set of concurrent events. They can only be used as a part of a StatSample object.
Started is the number of started events (including those that
ended).
Finished is the number of finished events.
Level.mean is the mean level of started but not finished
(pending) events during the measurement period. This statistics is not
very reliable, probably due to problems with the way the level is
computed. Polygraph essentially computes an integral of the measurement
function over the measurement period and then divides the computed space
by the duration of the period. This algorithm is either incorrect or
implementation is buggy, leading to surprising results in some tests.
Level.last is the number of not yet finished events at the end
of the measurement period. For short periods, this statistics should be
used instead of the level.mean until the latter is fixed.
Warning: Do not confuse StatsSample with StatSample. The
former is likely to be removed from PGL.
Use StatsSample objects to instruct Polygraph to collect
detailed samples of transactions.
The name field is just a label to identify a sample.
The start field specifies the delay since the beginning of the
test after which Polygraph will start collecting a sample.
Capacity determines the number of transactions in the
sample.
If samples overlap, the earlier sample(s) are forced to "close", and
the sample started last will get all the transactions.
At the time of writing, there are
no tools to extract collected samples from binary logs.
WebAxe4As type represents addressing scheme for WebAxe-4
workload.
Network addresses are represented using the addr type. The
addresses can store IPv4, IPv6, or FQDN information along with an
optional network interface name, port number, and subnet. Address
constants are usually specified using 'single quoted strings' as shown
below.
addr them = '204.71.200.245'; // no port number
addr theirServer = '204.71.200.245:80';
theirServer.host = '209.162.76.5'; // change host name only
theirServer.host = them; // error: type mismatch!
addr mask1 = '10.1.0.0/22'; // with a subnet
addr mask2 = 'fxp0::10.1.0.0:8080/22'; // more optional details
IPv6 addresses present a slight problem because common usage (e.g.,
in URLs) is to put a colon (":") between an address and a port number.
However, colons are are used as delimiters in IPv6 addresses, the same
way that dots (".") are used for IPv4. So that PGL can tell the
difference between an IPv6 digit and a port number, you must place
IPv6 addresses inside square brackets, like this:
addr foo = '[1234::5:1:2]';
addr server = '[1234::5:1:2]:80';
addr masked = '[1234::5:1:2]/120';
addr theworks = 'lo0::[1234::5:1:2]:80/120';
Arrays of addresses can be formed us |