Command options

Command line options of various Polygraph tools are documented here.

Value syntax for boolean, time, size, and distribution options matches PGL syntax for the corresponding type. When an option accepts a list of values, the values must be separated by commas.

aka --aliases <IPs>

The aliases option specifies what alias or aliases you want to have on the given interface. Aka recognizes IP addresses in PGL address format, including IP ranges. Aka will try to specify the subnet or you can use an explicit subnet specification.

The number of aliases you can set depends on your OS. Moreover, some OSes may support large number of aliases (more than 1000) but with a significant performance penalty. In our FreeBSD environment, 500 aliases seems to be the limit after which noticeable network performance degradation occurs.

Note that you can just put alias specs after all other options and the interface name (see aka's usage line).

Aka will delete all old aliases before setting new ones. If you do not specify the new aliases, the old ones will still be deleted (handy for cleaning up after yourself).

polyclt --cfg_dirs <directory_names>
polysrv --cfg_dirs <directory_names>
polypxy --cfg_dirs <directory_names>

The --cfg_dirs option specifies the list of directories that are searched for root configuration file as well as PGL #include files.

polyclt --config <filename>
polysrv --config <filename>
polypxy --config <filename>

The --config option specifies configuration file for polygraph clients and/or servers.

polyclt --console <filename>
polysrv --console <filename>
polypxy --console <filename>

The --console option redirects console output to the specified file.

distr_test --count <int>

The --count specifies how many samples Polygraph should take before producing the final histogram. Highly skewed or heavy tailed distributions usually require more samples to get a nice histogram.

distr_test --distr <numDistr>

The --distr option specifies the distribution to sample. The syntax is identical to any other option that requires a distribution value. You must use plain numbers (i.e., no time or size scale) to specify distribution parameters (if any).

Note that drawn random values are truncated to integers before being accounted in a histogram. This approach mimics the usual Polygraph run-time behavior. However, a special care should be taken to specify large enough numbers as the parameters for the distribution.

polyclt --dump <strings>
polysrv --dump <strings>
polypxy --dump <strings>

The --dump option controls what messages and what parts of messages should be printed on the console. Possible message types are: req[uest], rep[ly], and err[or]. Possible message parts are hdr (for ``header'') and body. Dumping of message bodies is not supported at the time of writing.

Here are some examples.

Dump requirement Dump option
all request headers --dump req-hdr
all requests and the headers of erroneous messages --dump req-hdrs,err-hdrs
everything about errors --dump errs
all headers --dump hdrs

Note that sometimes Polygraph does not have the required data at the time of dump. Polygraph will try to at least provide some meta information about the message then.

If a message part matches both negative (error) and positive (reply or request) type masks, the part may be printed twice.

polyclt --dump_size <size>
polysrv --dump_size <size>
polypxy --dump_size <size>

The --dump_size option limits the size of individual message dump. Particularly useful when dumping message bodies.

polyclt --fake_hosts <IPs>
polysrv --fake_hosts <IPs>
polypxy --fake_hosts <IPs>

The --fake_hosts option instructs Polygraph to use given addresses instead of looking up real network interfaces for available local addresses.

Polygraph configuration file binds robots and servers to specific IP addresses. Normally, Polygraph scans the list of network interfaces available on the host to determine which robots and servers to start. However, the default scan relies on semi-portable system calls and may not work correctly (or at all) on some platforms.

To disable the network interface scan, use the --fake_hosts option. The specified list of IP addresses will be used instead of the one obtained from the operating system.

polyclt --fd_limit <int>
polysrv --fd_limit <int>
polypxy --fd_limit <int>

The --fd_limit option decreases default filedescriptor limit.

Polygraph determines the maximum level of available file descriptors using getrlimit(RLIMIT_NOFILE) system call. It then attempts to set the current level to that maximum using the setrlimit(RLIMIT_NOFILE) system call. The return value of the latter (actually about 97% of it) is then used as a Polygraph internal FD limit.

Polygraph will not attempt to create more TCP sockets than its internal limit. However, some OSes are known to be unhappy when a process is close to the limit. In a non-production benchmarking environment, there may be also a competition for file descriptors with other processes. The fd_limit option can be used to lower the internal limit even further.

Polygraph should stop opening new connections if the internal FD limit is reached.

One cannot raise the FD limit using the fd_limit option! The original limit is reported by the operating system and must be changed first. Different OSes will require different techniques for raising the file descriptor limits. Some well-known hacks can be found elsewhere.

Remember to reconfigure and recompile Polygraph from scratch if you change OS limits. At configuration time, Polygraph will try to open as many files as possible to find out actual OS limitations.

polyclt --file_scan <string>
polysrv --file_scan <string>
polypxy --file_scan <string>

The --file_scan option selects the system call to use for scanning ready files. Two valid values are: select and poll. Poll is used by default, if available.

Most Unix operating systems have at least two system calls to detect ``ready'' file descriptors: poll(2) and select(2). See manual pages for your OS for details about these system calls.

File scanning method may affect performance of Polygraph under heavy loads or when working with large number of file descriptors. The effect is probably limited to how fast Polygraph can scan all ready files. The ``best'' system call to use depends on the OS and the load on Polygraph. We suspect that performance differences are marginal in many general cases. Experiment if you want to double check your environment.

polyclt --help
polysrv --help
polypxy --help
lr --help
lx --help
ltrace --help
distr_test --help

The --help option displays command line usage summary.

distr_test --hist_step <%>

The --hist_step option tells distr_test the size of a histogram bin (in percents of the total contribution). For example, a value of 1% would lead to 100 lines per histogram while a 5% step results in 20 lines.

polyclt --host_type
polysrv --host_type
polypxy --host_type

The --host_type option displays build environment information of the executable.

polyclt --icp_tout <time>

The --icp_tout option specifies how long to wait for an ICP_HIT reply before declaring an ICP miss condition.

polyclt --idle_tout <time>
polysrv --idle_tout <time>
polypxy --idle_tout <time>

The --idle_tout option specifies a finite time a Polygraph process should wait for some network activity. If no network activity happens within the specified time, Polygraph will stop the simulation with an ``inactivity timeout'' message.

Polygraph processes will usually stop simulation when all phases reach their goals. Sometimes a phase has a goal that cannot be reached. Sometimes network or other external conditions stall all pending transactions. In these and similar situations it is often desirable to stop the simulation even if not all phases are completed.

Specifying idle timeout on the client side is usually not a good idea because robots create their own traffic, never allowing the timeout to happen, regardless of the network conditions.

Starting with version 2.6, polysrv uses an idle timeout of 5 minutes by default.

aka --if <str>

The --if option specifies the name if the network interface (e.g., fxp0 or eth1). On many operating systems, you can get a list of all available interfaces by running ifconfig -a command.

Interface name must be specified for aka to work.

polyclt --ign_bad_cont_tags <bool>

The --ign_bad_cont_tags option tells robots to ignore bad content tags that they may find inside response bodies. Polygraph uses semi-custom markup tags to identify embedded objects (similar to <img> tags in HTML). When the content contains tags that confuse Polygraph (e.g., realistic content simulation is enabled on the server side), you might want to use this option.

polyclt --ign_false_hits <bool>
polypxy --ign_false_hits <bool>

The --ign_false_hits option instructs robots to ignore false hits.

Polygraph knows what objects it has requested during the current test. If a robot detects a hit on an object that was requested only once (the current request), it can complain about a ``false hit'' and register a transaction error.

However, there are many situations when false hits are not really ``false''. For example, two requests for a previously unseen object may be submitted very close to each other. If a proxy or server reorders the replies, Polygraph will think that the first reply is a false hit.

By default, Polygraph will not complain about false hits.

polysrv --ign_urls <bool>

The --ign_urls option instructs polysrv to generate content regardless of the URLs in the requests polysrv receives. This mode is useful when URLs are not generated by Polygraph robots or otherwise inappropriate for the server to interpret.

polyclt --label <string>
polysrv --label <string>
polypxy --label <string>

The --label option allows you to assign a string label to a run. The label gets logged (as any other option value) and is also included into notification messages. The latter is useful if you are running several experiments and want polymon to distinguish them by a short ``name''.

Note that notification messages may truncate labels.

polyclt --log <filename>
polysrv --log <filename>
polypxy --log <filename>

The --log option tells Polygraph to preserve detailed measurements and various messages in a binary log file. The file can then be analyzed by tools such as lx and ltrace.

The log file size depends primarily on the duration of the test and on the number of simulated agents in the test. Long, large scale tests may easily produce logs of 5 - 10MB in size (per Polygraph process).

polyclt --log_buf_size <size>
polysrv --log_buf_size <size>
polypxy --log_buf_size <size>

The --log_buf_size option specifies buffer size for the binary log. Polygraph periodically flushes logged data to disk and can resize logging buffers on-demand, so large buffer sizes are not needed.

The only known case when this option might be useful is when Polygraph runs out of memory when trying to log phase statistics at the end of the test (phase stats objects are large and may require buffer resizing that may lead to insufficient memory).

polyclt --notify <addr>
polysrv --notify <addr>
polypxy --notify <addr>

The --notify option instructs Poly to send status messages to a daemon on the specified host. The messages are small (about 128 bytes) and are sent using UDP protocol ensuring negligible overhead. The messages are emitted every stats cycle.

Polygraph distribution comes with a listening daemon (udp2tcpd) and an interactive monitoring program (polymon).

Monitoring capabilities are very handy if you want to watch your experiments closely, but do not want to create extra load on the Polygraph machines. Polymon is also helpful in monitoring several independent concurrent experiments.

lx --objects <strings>

The --objects option specifies the names of objects to extract from the binary log file. Binary logs store a lot of information. Lx calls a self-contained piece of info an object. Objects may be as simple as a single integer number and as complex as a distribution histogram.

To extract all known objects, omit the --objects option or use a magic object name ``All''.

lr --out <filename>
lx --out <filename>
ltrace --out <filename>

The --out option specifies a file where the results of the program execution should be sent.

lx --phases <list>

The --phases option selects which portion of the log (corresponding to a phase in the PGL schedule) will be analyzed by lx.

polyclt --ports <port_range>

The --ports option instructs polyclt to explicitly bind(2) a socket to a specific port before making a connect(2) request. The actual port is selected from the specified range, using LRU approach. If the bind(2) call fails, the port is marked as ``used'' and is never tried again unless there are no ``unused'' ports left. Polygraph also keeps a map of the ports it is currently using to avoid binding to the same port twice.

The default port range used by OSes for ephemeral ports is often rather small. Thus, an application is likely to run out of available ports if request rate is high and explicit binding is not used.

To reduce the number of run-time conflicts, Polygraph pre-scans the given port range to find invalid ports. The scan may add a couple of seconds to polyclt start time.

polyclt --priority_sched <int>
polysrv --priority_sched <int>
polypxy --priority_sched <int>

The --priority_sched option specifies priority level for urgent socket operations. Higher levels allow Polygraph to scan just the active sockets more often (at the expense of potentially delaying processing for sockets that used to be inactive but changed their status).

The default should be acceptable for most environments.

polyclt --prn_false_misses <bool>
polysrv --prn_false_misses <bool>
polypxy --prn_false_misses <bool>

The --prn_false_misses option dumps reply headers of false misses. Polygraph knows what objects it has requested during current test. If a robot detects a miss on an object that was requested before, it marks the transaction is a ``false miss''. False misses are not errors, but a possible indication that a proxy did not cache an object when it had a chance to do it (or purged a cached object).

False miss information is often helpful for debugging a proxy or workload. However, because many false misses are a part of the normal HTTP operation in a distributed environment, it may take some time to find real proxy mistakes in a large trace.

polyclt --proxy <addr>

The --proxy option instructs all robots of the polyclt process to use proxy address as the next-hop address of all requests and to use Robot::origins names in request URLs.

When the --proxy option is not given, robots use Robot::origins addresses as the next-hop addresses and use paths component only in request URLs.

Note that the origin address is always copied to the Host: HTTP header.

The presence of the --proxy option is often the only Polygraph-side configuration difference between ``forward proxying'' and ``transparent redirection'' bench setups. The option is also useful for running no-proxy tests to verify bench setup.

polyclt --rng_seed <int>
polysrv --rng_seed <int>
polypxy --rng_seed <int>

After version 2.6.2 was released, the --rng_seed option was removed in favor of --glb_rng_seed amd --lcl_rng_seed options.

The --rng_seed option initializes general purpose r.n.g. with a specified seed. By varying the seed, one can test how susceptive to random noise the simulation is.

polyclt --glb_rng_seed <int>
polysrv --glb_rng_seed <int>
polypxy --glb_rng_seed <int>

The --glb_rng_seed option initializes ``global'' r.n.g. with a specified seed. Global r.n.g. affects objects with global scope (i.e., objects that have to be the same regardless of with Polygraph process is generating them). For example, a URL extension, while "random", must be the same for the same object ID regardless of the process that generates the URL. Thus, all processes within a test must have the same global r.n.g. seed.

The default seed value is 1. By varying the seed, one can test how susceptive to random noise the simulation is.

polyclt --lcl_rng_seed <int>
polysrv --lcl_rng_seed <int>
polypxy --lcl_rng_seed <int>

The --lcl_rng_seed option initializes ``local'' r.n.g. with a specified seed. Local r.n.g. affects events with local scope (i.e., events that should differ from one process to another). For example, a "think time" delay after receiving 100th request should be different on different servers. Using equal seeds may lead to a step-lock behavior among cooperating processes. Thus, all processes within a test should have different local r.n.g. seeds.

The default seed value is 1. By varying the seed, one can test how susceptive to random noise the simulation is.

polyclt --sample_log <filename>
polysrv --sample_log <filename>
polypxy --sample_log <filename>

The --sample_log option specifies the name for a stand-alone binary log file that captures PGL-configures stat samples.

polyclt --sample_log_buf_size <size>
polysrv --sample_log_buf_size <size>
polypxy --sample_log_buf_size <size>

The --sample_log_buf_size option specifies buffer size for the sample log. See --log_buf_size description for related caveats.

lx --side <string>
ltrace --side <string>

The --side option specifies the name of the `side' to extract. Valid values are ``clt'', ``srv'', and ``all''. This option is only useful for polypxy logs because other logs have just one ``side'', and log extracting tools can guess what that side is.

ltrace --smooth_slide <bool>

If set, the --smooth_slide option instructs ltrace to use sliding window (with a sliding step of one log entry) for averaging log entries as opposed to jumping from one group of entries to the next. This option is useful for building smooth, albeit less precise, graphs.

polyclt --stats_cycle <time>
polysrv --stats_cycle <time>
polypxy --stats_cycle <time>

The --stats_cycle option specifies the duration of a statistical interval cycle (5sec by default). Shorter cycles give more precise statistics but result in larger binary logs.

polyclt --sync_phases <bool>
polysrv --sync_phases <bool>
polypxy --sync_phases <bool>

The --sync_phases option instructs Polygraph to synchronize phase schedules among remote processes. For synchronization to make sense, all processes must use the same PGL phase schedules and, ideally, the same configuration files. Synchronization is implemented on top of HTTP transactions; it works fine when all processes are running and when transactions have reasonable response times. Polygraph is likely to stuck in a phase if one of the processes quit or experiences severe performance problems.

Phase synchronization is on by default.

ltrace --sync_times <bool>

The --sync_times option tells ltrace to adjust local log time as if all logs started at once. The adjustment happens as the logs a read and before stats are reported, not log modification is performed.

This option is useful for processing logs from machines with de-synchronized clocks.

ltrace --time_unit <time>

The --time_unit option has two effects: ltrace uses time since test start when reporting timestamps and that relative time is reported in the specified units or scale. By default, absolute time (seconds since Unix epoch) is reported.

For example, to get ltrace to display relative timestamps at one minute scale, use --time_unit 1min.

polyclt --unique_world <bool>
polysrv --unique_world <bool>
polypxy --unique_world <bool>

The --unique_world option instructs Polygraph to generate URLs that are very unlikely to be used by other Polygraph invocations. This option is on by default. The only known practical reason to disable unique worlds is when two tests must produce the set of the same URLs. In the latter case, the random number generator seed essentially identifies the set.

polyclt --verb_lvl <int>
polysrv --verb_lvl <int>
polypxy --verb_lvl <int>

The --verb_lvl option specifies how much info will be printed to the console during a test. Normally, level zero will have only a couple of lines per run. Level one will allow for interval and phase stats lines to be printed. A negative level will disable any output.

Most errors are reported with level zero.

Most tests can be run with level 5 or lower verbosity, but it is a good idea to raise verbosity level to 10 if you are having problems.

Regardless of verb_lvl setting, all console messages are duplicated in the binary log.

polyclt --version
polysrv --version
polypxy --version

The --version option displays Polygraph distribution version.

ltrace --win_len <time>

The --win_len option specifies the length of an averaging window (in terms of time) that ltrace is using with the --smooth_slide option.