Web Polygraph 

 Documentation
    Reference Manual
    User Manual
       Getting started
       Simple tests
       Addresses
       Using DNS
       Content simulation
       Compression
       Authentication
       POST/PUT
       Ranges
·      Trace replay
          For the impatient
          Introduction
          Trace format
          Client side
          Server side
          DNS
          Example
       Watchdogs
       From access logs
       Recycling tests
       Reporting
       Troubleshooting
    Workloads

 Downloads
    Copyright
    Changes
    Upgrade

 Support
    Wish List
    Sponsors
    Mailing List
    Users

 Miscellaneous
    Results
    Compliance
    Other Tools
    Search

  

Home · Search · Print · Help 


Trace replay

This page describes how to replay URL traces with Polygraph. Trace replay functionality has been in Polygraph for a while, but DNS-related replay features are available starting with Polygraph version 3.0.

Table of Contents

1. For the impatient
2. Introduction
3. Trace format
4. Client side
5. Server side
6. DNS
7. Example

1. For the impatient

Server S = {
    addresses = ...;
};

Robot R = {
    interests = [ "foreign" ];
    foreign_trace = "/tmp/trace.urls";
    ...
    origins = S.addresses;
    dns_resolver = ...;
};

AddrMap M = {
    zone = ".";
    addresses = S.addresses;
    names = tracedHosts(R.foreign_trace);
};

2. Introduction

Polygraph supports replaying of URL traces: Polygraph robots load the entire trace into RAM and use traced URLs for some or all of the generated requests. Trace replaying can be useful for testing URL and content filters as well as for introducing real origin servers into the mix. In general, both real and Polygrpaph servers can be used to replay a trace.

The section below document trace format and explain how to configure the client, server, and DNS sides of a test for trace replay.

3. Trace format

The simplest trace is a plain text file, with one HTTP URL per line. Polygraph can also accept many proxy "access logs" as traces.

http://www.example.com/
http://www.example.com:8080/
http://www.example.com/path/index.html
http://172.16.0.1:80/path/index.htm

When parsing a trace, Polygraph ignores comments and empty lines. A comment starts with a "#" character and continues to the end of the line. To find a URL on a line, Polygraph looks for the first sequence of non-space characters starting with "http://". Once the first URL is found, Polygraph continues with the next line. Since most access logs contain request URI as the first URL in a log entry, Polygraph can handle access logs without knowing their exact format.

Polygraph ignores non-HTTP URIs because it cannot fetch them.

# one can use comments to describe traces:
# this trace came from http://www.example.com/
# the above URL will not be considered part of a trace

# URL below will be used
http://www.example.com/

# URL below will be skipped because it does not use http schema
ftp://www.example.com/path/index.html

# the anchor part below will be ignored as a comment
http://www.example.com/index.html#anchorToIgnore

# IP addresses and port numbers are fine
http://172.16.0.1:8080/path/index.htm

# there is no URL on the next line, from Polygraph point of view
www.example.com/index.html

# only the example.org URL will be noticed and used:
12 example.com http://www.example.org/ 34513 http://example.net/

4. Client side

A Polygraph robot can be configured to requests URLs from a trace using the combination of interests and foreign_trace options:

Robot R = {
    interests = [ "foreign" ]; // use traces
    foreign_trace = "/tmp/test.urls";
    ...
};

In most cases, it is a good idea to preserve some Polygraph-generated traffic in the test. Without such traffic, Polygraph may not be able to synchronize phases and may not even leave the first phase. You can preserve Polygraph-generated URLs by adding another Robot with public and/or private interest or by combining multiple interests:

Robot R = {
    interests = [ "private": 1%, "foreign" ]; // 99% from trace
    foreign_trace = "/tmp/test.urls";
    ...
};

To generate i-th miss, the robot requests the n-th URL from the trace, modulo trace length. Thus, when all URLs have been visited, no true misses will be generated and actual recurrance ratio will not match the configured one. If acuurate recurrance ratio is important, the test must stop before all URLs are used up. URLs are revisited to generate hits if recurrance ratio is positive. Also note that when traced URLs are not generated for each test, a cache may already store matching responses from previous tests. Flush the cache before each test if you want to avoid the "memory effect".

A robot does not check that a traced URL belongs to one of the known origin servers (i.e., has its host listed in the origins field). This implies that Polygraph can be used to request traced URLs from both Polygraph origin servers and real or, more precisely, "foreign" or "not listed in the test configuration") origin servers.

All traced URLs are "foreign" URLs. Polygraph robots will report the number of foreign URLs requested and the number of corresponding repsonses. For example, the console output below shows that 747 requests using foreign URLs were sent and all of them were responded to. Note that the total number of responses (754) is slightly higher, indicating that seven Polygraph-specific URLs used as well.

000.54| i-dflt    754  28.40      1   0.00   0    1
000.54| foreign URLs: requested: 747 served: 747

Other source of foreign URLs are foreign URLs embedded in responses that robots are configured to parse.

5. Server side

If the trace contains Polygraph server addresses, Polygraph servers will receive traced URLs. The --accept_foreign_msgs yes command-line option must then be used or the servers will refuse to serve any content and close the connection. If the option is set, the servers will respond, using the first content type configuration to generate the response.

% polysrv --accept_foreign_msgs yes --config ...

If the trace does not contain Polygraph server addresses, then no special server-side configuration is needed as far as trace replay is concerned. However, it is usually a good idea to still have at least some Polygaph-specific traffic reaching Polygraph servers (see clisnt-side discussion above for details).

6. DNS

When the trace contains domain names (and not just host IP addresses), Polygraph robots and/or the proxy need to resolve those names. When the trace contains many real domain names, and the use of real resolvers is not desirable, one has to configure a root name server to resolve all host names in a trace. This can be done using the dns_cfg tool that comes with Web Polygraph.

The dns_cfg tool can convert a PGL configuration file that uses address maps into forward and reverse zone configuration files (and BIND configuration file). It is easy to get unique host names from the trace into address map using the tracedHosts() PGL function:

AddrMap Map = {
    zone = "."; // root zone
    addresses = ... // usually origin server addresses
    names = tracedHosts("/tmp/test.urls");
};

If the trace contains a mixture of host names from different TLDs, you should use the root zone in the PGL address map, as illustrated above. More information about dns_cfg is available elsewhere

7. Example

Test_trace.pg, a very simple but complete and functioning workload that can be used for replaying a trace, is available Polygraph distributions starting with version 3.0. Just bring your own trace.


Home · Search · Print · Help