HTTP compression support

This page describes PGL knobs that control generation of compressed content on servers and acceptance of such content by robots. These knobs are available starting with Polygraph version 3.0.

Table of Contents

1. For the impatient
2. Introduction
3. Server side
4. Client side
5. Performance

1. For the impatient

Content ZippableContent = {
    encodings = [ "gzip" ];
Server S = {
    contents = [ ZippableContent, ... ];
Robot R = {
    accept_content_encodings = [ "gzip", "identity" ];

2. Introduction

Polygraph supports HTTP responses with identity and gzip content encoding. Identity encoding is the default. Polygraph servers, can be configured to compress content on the fly and respond with gzipped content. Polygraph robots can declare their support for gzipped content encoding and can accept gzipped content.

The client and server sides of a test are relatively independent as far as compression is concerned. A proxy might trick Polygraph server into generating compressed content even if the robot does not declare gzip acceptance. Polygraph robots can solicit and accept content compressed by the proxy even if the server does not compress.

Compression support depends on an open source zlib library. While zlib is often already installed on your system, you will need a relatively recent version that comes with native gzip in-memory compression support. We have tested compression features with zlib version 1.2.1 (version 1.1.4 did not work). Polygraph is using level 6 compression, which is the default level in zlib.

3. Server side

To enable compression on the server side, add gzip encoding to the list of supported content encodings for a given Content object:

Content ZippableContent = {
    encodings = [ "gzip" ]; // use gzip encoding exclusively

While only one content coding can be applied at a time, a single Content object may support multiple content codings. For example, the following declaration allows for FlexibleContent to be returned "as is" or compressed.

Content FlexibleContent = {
    encodings = [ "identity", "gzip" ]; // use gzip or identity coding

Request URI determines the Content object in a Server.contents selector that the server will use for generation of the response. Once the Content object is determined, the server will select content coding that matches content coding acceptable to the client (declared via a Accept-Encoding request header). A server responds with a 406 "Not Acceptable" error if the request matches no encoding:

error: 2/2 (c81) no content coding acceptable to requester is supported

Polygraph servers ignore qvalue coding parameters in Accept-Encoding request headers and assume that identity coding is always acceptable to the client. In reality, it is technically possible for a client to specify that identity coding is not acceptable by using a qvalue of zero.

At the time of writing, only gzip and identity codings are supported. Here are their definitions extracted from RFC 2616.

The default (identity) encoding; the use of no transformation whatsoever. This content-coding is used only in the Accept- Encoding header, and SHOULD NOT be used in the Content-Encoding header.
An encoding format produced by the file compression program "gzip" (GNU zip) as described in RFC 1952. This format is a Lempel-Ziv coding (LZ77) with a 32 bit CRC.

If no encodings are specified for the Content object, identity coding is assumed.

Content compression, if any, happens after raw content is generated or extracted from a content database. Thus, all Content PGL parameters except for encodings affect raw (unencoded) content. For example, the size distribution specifies the size of unencoded responses. Actual responses will be smaller if compression is used.

Note that, following HTTP rules, MD5 checksums calculated by servers and sent in Content-MD5 headers are computed after content coding is applied.

4. Client side

Polygraph robots can be configured to declare support for content codings via an accept_content_encodings parameter. For example:

Robot R = {
    accept_content_encodings = [ "gzip", "identity", "x-DRM32;q=0" ];

The contents of the accept_content_encodings list does not have any affect beyond the generation of the Accept-Encoding request header, but that may change in the future if Robots need to interpret encoded content.

By default, identity coding is assumed and no Accept-Encoding header is sent.

5. Performance

To support gzip content coding, Polygraph servers compress content on-the-fly. This makes it easy to configure a test with generated or pre-loaded uncompressed content. However, runtime compression is expensive. We do not have good data points yet, but expect to see significant (hundreds of milliseconds) server-side response time increases for typical object sizes of around 10KB.

It is possible to avoid or speed-up runtime compression. Let us know if the current speed is not acceptable to you.