Using DNS

Most real HTTP requests use Domain Name System (DNS) addresses like www.google.com rather than Internet Protocol (IP) addresses like 216.239.37.100. This section explains how to use DNS addresses in Polygraph tests. DNS support is available starting with Polygraph version 2.6.0.

Table of Contents

1. Prerequisites
2. PGL configuration
    2.1 Address map
    2.2 Robot Resolvers
    2.3 Dynamic Domain Names
3. DNS server configuration
4. Step-by-step plan
5. How it works
6. Warning

1. Prerequisites

When you tell a Polygraph robot to contact an origin server at, say, www.host1.tst, the robot resolves the name www.host1.tst name into an IP address. Polygraph does not map names to IPs on its own. Just like most Web agents, Polygraph relies on external DNS servers to map a domain name into an IP address. Thus, in order to use domain names in a test, you must have a working DNS server that Polygraph robots can query.

You DNS server must be configured to resolve names that are used during the tests. DNS server configuration and operation is external to Polygraph as long as the server is capable of resolving the test-specific names. You can use any server compliant with DNS RFCs 1033, 1034, and 1035. The BIND server from the Internet Software Consortium (ISC) is a popular implementation that is also partially supported by Polygraph dns_cfg tool described below.

Note that when Polygraph robots send their request directly to a proxy, they do not need to resolve domain names. Nevertheless, you would still need a working DNS server for the proxy to do the name resolution. In case of an interception (a.k.a transparent) proxy, both robot and proxy may need to resolve domain names.

2. PGL configuration

2.1 Address map

Polygraph needs to know which Polygraph servers correspond to what domain names. Polygraph servers are identified by their IP addresses (and port numbers). There has to be some mapping between domain names and IP addresses. That mapping cannot rely on run-time DNS queries because the map may be needed before any queries can be made and because of the overhead querying can cause.

The mapping is provided by the AddrMap objects. For example, to map 100 domain names to 100 IP addresses, one may use the following configuration:

// a 1:1 mapping of 100 addresses
AddrMap map = {
    addresses = [ '10.13.1-2.1-50:80' ];
    names = [ 'www.host[1-99].hosting.tst', 'www.otherhost.tst' ];
};
...
use(map);

In many environments, one domain name is resolved to many addresses. Here is an example of how such mapping can be done in Polygraph.

// a 1:N map
AddrMap map = {
    addresses = [ '10.13.1-2.1-50:80' ];
    names = [ 'www.hosting.tst' ];
}
...
use(map);

Note that the mapping has nothing to do with how robots resolve domain names! Robots do not use PGL address maps; they send real DNS queries instead. The map simply tells Polygraph that an origin server 10.13.1.25:80 is also known as www.host25.hosting.tst or www.hosting.tst, depending on the map in use. However, DNS server responses should match the PGL mapping or you may get "foreign request" and other Polygraph errors.

See the addressing scheme manual to learn more about address-related dependencies in Polygraph.

2.2 Robot Resolvers

To resolve domain names, a robot needs to contact a DNS server. You need to tell the robot where your DNS servers are. Robot's dns_resolver field (which is of type DnsResolver) is the right place to do that. Here is an example.

DnsResolver dr = {
    servers = [ '10.13.0.254:53' ];
    timeout = 5sec;
};
Robot R = {
    ...
    dns_resolver = dr;
};

2.3 Dynamic Domain Names

To create a stream of new (i.e., never seen before during the test) domain names, create a DynamicName object using the dynamicName() function and use that object as a regular origin server name in an address map or in the Robot origins parameter. To map between dynamic domain names and the actual origin servers you need to use AddrMap.

Here is an example:

Server S = {
    ...
    addresses = [ '127.0.0.1:80' ]; // where to start the servers
};

// when request to a new object is made there is 10%
// possibility that a new domain name would be generated
DynamicName DN = dynamicName('*.example.com:80', 10%);

// a 1:1 map
AddrMap M = {
    names = [ DN ];
    addresses = S.addresses;
};

Robot R = {
    ...
    origins = M.names; // names to use to reach the origin servers
};

use(M);
use(S, R);

Note that address mappings (documented in the Address map section above) work the same for dynamic domain names. You can mix DomainName objects with regular addresses. Each DomainName object is treated as a single domain name. Here is an example:

AddrMap M = {
    // 1 to 1 mapping
    // Dynamic domain name *.foo.com is mapped to 192.168.0.1
    // Domain name bar.com is mapped to 192.168.0.2
    names = [ dynamicName('*.foo.com:9090', 10%), 'bar.com' ];
    addresses = [ '192.168.0.1:9090', '192.168.0.2:9090' ];
};

To create many servers with unique dynamic domain names, use the dynamize() PGL function. It works similar to dynamicName(), but converts an entire array of domain names:

// convert 100 static domain names into 100 dynamic domain names
addr[] dnames = dynamize('foo[1-100].com:9090', 10%);

In real workloads, the number of addresses and names often varies depending on the peak load or some other configuration parameter. Address schemes can be used to compute the right number of server IP addresses, but how to get the same number of dynamic names (to preserve 1:1 IP-to-domain mapping) when you cannot hard-code the static names as shown in the above example? Use the ipsToNames() PGL function to go from IP addresses to static names and then polish off with the dynamize() call:

AddrMap M = {
    // compute server IPs using an address scheme and a bench config
    addresses = serverAddrs(asSpread, TheBench);

    // convert the above IP addresses into static domain names,
    // using a name pattern with an IP macro to get unique names
    addr[] domains = ipsToNames(addresses, "h${dashed_ip}.tst");

    // convert static domain names into dynamic names
    names = dynamize(domains, 10%);
};

Dynamic domain names are shared among distributed Polygraph clients. Dynamic names are a part of the generated URL space and, hence, are affected by the URL generation and working set maintenance algorithms.

The number of dynamic domain names in use by Polygraph is limited if and only if the working set size is frozen. The working set size determines the number of dynamic names in use. You can estimate that number by multiplying the working set size by the new domain name generation probability (i.e., the second parameter of the dynamicName and dynamize function calls).

DNS configuration for dynamic domain names can be generated with the dns_cfg tool discussed below. Note that reverse zone maps IP address to a single domain name. For '*.example.com' dynamic domain name that would be 'w000000.example.com'.

Support for dynamic name generation is available starting with Polygraph version 3.4.

3. DNS server configuration

DNS server do not have a common configuration format. You are on your own when configuring your DNS servers. However, Polygraph distribution includes a tool, called dns_cfg, that you may find useful to bootstrap your configuration activities.

Given a zone name, DNS names, and IP addresses, dns_cfg builds configuration suitable for use with BIND and, perhaps, other DNS servers. Dns_cfg output consists of three parts. The first part is the text to cut-and-paste into BIND's named.conf file. This part is sent to the standard output. The outher two parts are direct and reverse zone files. Those may be quite large and are dumped on disk into the appropriately named files.

> dns_cfg --zone hosting.tst \
    --addresses 10.13.1-2.1-50 \
    --names www.host[1-100].hosting.tst

# zone descriptions for named.conf: 
zone "168.192.IN-ADDR.ARPA" {
        type master;
        file "master/hosting.tst.rev";
};

zone "hosting.tst" {
        type master;
        file "master/hosting.tst";
};

> ls # get listing of created files
hosting.tst
hosting.tst.rev

A better way to generate files is to use the PGL configuration itself as the source of information for dns_cfg. The tool can parse your PGL workload and extract IP-to-names mapping information from use()d address maps:

> dns_cfg --config workload.pg \
    --cfg_dirs /usr/local/polygraph/workloads/include \
    > named.conf

Please note that, at the time of writing, dns_cfg is not configurable and smart enough. For example, you can see that the zone label in the output above is wrong (should be 10.13.IN-ADDR.ARPA). Nevertheless, dns_cfg saves considerable amount of configuration work, especially if you are using BIND resolver. Simply adjust dns_cfg output to suit your needs.

Remember to re-generate DNS server configuration and re-configure your DNS server whenever any aspects affecting domain names are changed. For example, if domain names are generated using a PGL ipsToNames function call, then any change to ipsToNames()'s ips or zone parameters should prompt name server reconfiguration. Since even minor changes to the workload may affect domain names, and as a rule of thumb, generate and configure domain name server every time you start a test, using that test PGL file and the dns_cfg tool.

4. Step-by-step plan

Here is a plan you can follow to start using DNS names in your workload. The discussion in the above sections provides more information about each step.

  1. Decide where external DNS server should run. Decide which DNS server to use. Install the server.

  2. Add address map (AddrMap) to your PGL workload, describing the mapping between domain names and IP addresses. Do not forget to use() your map.

  3. List domain names (from your AddrMap) in Robot origins field.

  4. Continue to use IP addresses in addresses fields of robots and servers, just as you normally would.

  5. Configure the device under test to use your DNS server.

  6. If testing a transparent proxy or a similar implicit device, use a DnsResolver object to specify the address of your DNS server and query timeout for the Polygraph resolver to use. Specify DNS resolver in Robot dns_resolver field.

  7. For every test execution, [re]configure your DNS server to resolve test domain names according to the PGL workload used. Use dns_cfg tool to create server configuration files if possible. Check your DNS server configuration with dig, host, nslookup, or a similar external tool.

  8. Proceed with testing as usual. Robots and/or the proxy should resolve DNS names by sending UDP queries to the DNS server now.

5. How it works

To resolve a domain name, a Polygraph robot sends a DNS request to the configured DNS server using UDP protocol. The resolution is asynchronous. That is, a single robot may have multiple outstanding queries, and several robots send and receive queries independent from each other.

Polygraph does not use gethostbyname(3) system call because that call is blocking. Polygraph does not check /etc/hosts or a similar file for "local" name bindings because a simulated robot environment may differ from the host environment and from other robot environments. As a side effect, if a name is resolvable using nslookup or a similar command on the client machine, it does not guarantee that the same name can be resolved by Polygraph. You must make sure that DNS server can resolve all names used for the test without any "external" help. Dig tool can be used for that purpose as it can query a given name server directly.

Robots send DNS queries just before opening a connection to a server. At the time of writing, robots do not cache DNS responses and use the first IP from the response only.

6. Warning

As any powerful tool, Polygraph may be harmful if misused. Using DNS names in workloads increases the chances that you can overload or otherwise hurt server(s) and/or network(s) not (willingly) participating in the test. Here are some simple safety rules.