Below is a list of most common messages, followed by their descriptions. If
a message you are interested in is not documented, please let us know.
error(errForeignReq): ``foreign request''
A "foreign request" error indicates that a Polygraph server has
received a request that does not look like anything a Polygraph
client (or a proxy on behalf of a client) would produce. To detect
a Polygraph client request, servers look for Poly-specific URL
format and Poly-specific HTTP extension header fields.
You should not be getting "foreign request errors" on no-proxy
runs. If you get this error with a proxy in the loop, you may want
to investigate what requests the proxy is sending that the server
does not recognize. Enabling the --dump errs option
in polysrv may help.
error(errForeignRep): ``foreign reply''
A "foreign reply" error indicates that a Polygraph client has
received a reply that does not look like anything a Polygraph
server (or a proxy on behalf of a server) would produce. To detect
a Polygraph server response, clients look for Poly-specific HTTP
extension header fields.
You should not be getting "foreign reply errors" on no-proxy
runs. If you get this error with a proxy in the loop, you should
investigate what responses the proxy is sending that the client
does not recognize. Enabling the --dump errs option
in polyclt may help.
The most common source of "foreign reply" errors is a proxy
generating a proxy-specific error page. For example, a proxy may
report server side connectivity errors or an overload condition.
We have also seen products that send a company ad as a part of the
response for the first request from a given IP address.
error(errIcpForeignReq): ``foreign ICP request''
ICP request contains valid URL, but that URL was not in
Polygraph format. Specifically, Polygraph failed to extract object
identifier from the URL. Perhaps a non-Polygraph-aware client is
submitting ICP requests to Polygraph agent?
error(errIcpForeignRep): ``foreign ICP reply''
ICP reply contains valid URL, but that URL was not in Polygraph
format. Specifically, Polygraph failed to extract object
identifier from the URL. This should not happen because Polygraph
requests URLs of valid format only.
error(errHttpRLine): ``malformed HTTP request or response line''
This error indicates that a Polygraph client received HTTP
response headers but could not extract the protocol version or the
response code from the headers. For example, an ``HTTP/1.1 200
OK'' response line indicates that the protocol version is ``1.1''
and the response code is ``200''.
You should not be getting "malformed HTTP request" errors
during no-proxy runs. If you get this error with a proxy in the
loop, you should investigate what responses the proxy is sending
that the client cannot parse. Enabling the
--dump errs option in polyclt may help.
At this time, Polygraph servers do not emit this error, but
that may change.
error(errMisdirRequest): ``misdirected request''
Polygraph origin server received request with the
Host: header field that does not match server's
address(es).
error(errForeignHostName): ``foreign host name''
The request is for the object at an address (i.e., the
host:port pair) that receiving Polygraph process does not
manage.
error(errBadHostName): ``failed to parse host name''
Polygraph failed to parse the host component of the URL in an
ICP message. At the time of writing, only IP addresses are
recognized and FQDNs are not supported.
error(errPrematureEof): ``premature end of msg body''
Transmission of HTTP message body terminated before the entire
message was received. Usually means that the TCP connection was
closed before polyclt read the response.
error(errPrematureEoh): ``premature end of msg header''
Transmission of HTTP message header terminated after receiving
some portion of the messages header but before the entire message
header was received.
error(errExtraRepData): ``extra reply data''
This message will be documented on-demand.
error(errNoHdrClose): ``connection closed before sending headers''
The TCP connection got terminated when Polygraph tried to read
the beginning of the next message header on a persistent
connection. The most likely reason for the latter is a race
condition allowed by HTTP: A server (proxy) may close an idle
persistent connection after the client (proxy) sent the request
but before the request reached the other end. Since this kind of
error is normal for HTTP operation, you may ignore small number of
them.
error(errNoCLen): ``missing Content-Length header''
Client received a response with no Content-Length HTTP
header field. All ``200 OK'' Polygraph responses have
Content-Length header. All ``304 Not modified'' Polygraph
responses do not. However, at the time of writing, the client
should not be receiving 304 replies because the client does not
send If-Modified-Since requests.
The usual cause of this error are various error pages generated
by a proxy (see also errForeignRep message). Note
that polyclt checks for Content-Length header before it
checks whether the response is ``foreign''.
error(errHugeHdr): ``HTTP header is too big''
Polygraph ran out of I/O buffer space (16KB) before an
HTTP header terminated.
error(errUnchbHit): ``hit on uncachable object''
Polygraph request for an uncachable object was satisfied with a
cached response.
Polysrv marks uncachable objects with the following HTTP
headers.
Cache-Control: private,no-cache
Pragma: no-cache
At least one proxy is known not to ignore this header and
sometimes return cached objects anyway.
error(errReloadHit): ``hit on reload request''
Polygraph attempt to ``reload'' an object was satisfied with a
cached response.
Polyclt marks ``reload'' requests with the following HTTP
headers:
Pragma: no-cache
Cache-Control: no-cache
Many proxies are known to ignore some ``reload'' requests,
especially under peak loads.
error(errFalseHit): ``false hit''
Polyclt first request for a cachable object was satisfied with
a cached response.
A typical cause is when two concurrent requests for the same
cachable object are satisfied in the reverse order, resulting
(from polyclt perspective) in a false hit and a false miss.
This is not a real error in most environments, and it's
detection is not enabled by default.
error(errServerGone): ``server had to terminate''
This message will be documented on-demand.
error(errLogBufFull): ``log buffer is full''
This message will be documented on-demand.
error(errNidMapLimit): ``a server-advertised oid has not been requested for a while''
Polygraph robots ``reserve'' object identifiers (oids) on
servers. In other words, a server pre-allocated oids to be later
requested by a robot. If a robot does not request the reserved
object for a long time, the server complains.
Polygraph attempts to increase internal oid buffers to keep oid
reservations longer, to adapt to current run conditions. If you
get just a few of these errors in the beginning of a run, ignore
them. If you continue to get the errors despite stable request
rate, something is broken. Check that reply rate is close to
request rate. That is, check that there is no backlog of
unsatisfied requests.
Occasional errors of this kind are also unavoidable if you have
other transaction errors. If a request is ``lost'' before reaching
the server, polyclt would think it has requested an oid and might
not requested it ever again.
error(errSrvChangedWid): ``client discovered server world id change''
Server world id is a unique identifier attached to each server.
The identifier is unique across simulations. It is reported back
to robots using extension HTTP extension header fields.
If you restart polysrv while polyclt is
running, the latter will notice the change and will complain.
Polyclt should be able to recover on its own, but do not restart
servers during production runs.
error(errForeignTag): ``foreign content <tag>''
Polysrv inserts Polygraph-specific tags into the body of some
responses. If those tags cannot be recognized by a robot, this
error is reported.
Unless it is a Polygraph bug, the error means that the proxy is
modifying content (i.e., response bodies) on-the-fly. This should
not be happening.
error(errMalformedTag): ``malformed content <tag>''
Same as errForeignTag
message, except a robot failed even to parse the tag (wrong tag
syntax).
error(errOpenTag): ``open content <tag''
This is a particular instance of the errMalformedTag message. A
tag is missing its closing bracket, ``>''.
error(errContentLeftovers): ``content syntax error at end of message body''
Polygraph got the expected number of response bytes, but failed
to parse the content. For example, a markup tag may have remained
opened until the end of the response body. These errors should be
accompanied by more details from the parser responsible for
handling the given response body encoding or content type.
error(errChunkHugeToken): ``huge token in chunked encoded message''
Polygraph complains that the connection I/O buffer is full but
the chunked encoding parser is unable to make progress. This can
happen, for example, when a chunked encoding is missing a
mandatory delimiter, and the parser keeps waiting for them until
the buffer gets full. This error may indicate chunked encoding
parser bugs or a corrupted message body encoding.
error(errUnreachContType): ``unreachable content type''
A robot found an embedded URL in the container object received
from the server. The URL pointed to an object of a certain content
type. The robot checked whether the server can produce an object
of that type and found out that the server cannot. The robot will
try to request some other object from the same server instead.
In the context of this description, the server means "visible"
server (i.e., whatever names you list in the origins
field of the robot configuration). Using AddrMap, a
visible server may be mapped into several real servers, of
course.
Check that the servers behind the visible name have the content
type that robot is asked to retrieve. Start with looking at the
contents field of the corresponding server
configurations.
error(errTooManyWaitXact): ``too many postponed xactions''
The per-robot queue that keeps transactions waiting for
resources got reached wait_xact_lmt limit (PGL robot's
configuration field) and hence cannot grow any more. You need to
decrease request rate, increase the number of connections
available to a robot, or do something else to resolve the
bottleneck unless you have deliberately used a low
wait_xact_lmt value.
Transactions that exceed the limit are ignored (never
executed).
error(errTimingDrift): ``internal timers may be getting behind''
Polygraph maintains many alarms and timers for internal
scheduling purposes. If those timers start getting behind (i.e.,
the events are not executed on time, getting late), you get this
error. You are probably overloading Polygraph process or the
machine that process runs on.
error(errSiblingViolation): ``violation of a sibling relationship''
Polygraph proxy was requested to serve an object that could not
be served from the proxy's cache. At the time of writing, such
requests are refused.
error(errStaleHit): ``stale object''
Polygraph robots complain about stale hits if the value of the
Date: header in HTTP response is less than the last
modification time (LMT) of the corresponding object and
the corresponding request was issued after the object was
modified. Note that Polygraph robots compute LMT based on object
ID and completely ignore the Expires: or
Last-Modified: HTTP headers. Real Expires:
headers usually lie and a Last-Modified: header is
necessarily stale if the response is stale so both cannot be
trusted.
The above staleness condition excludes cases where the object
became stale (i.e., was modified) in-transit. This is by
design as we do not want to test HTTP robustness here.
"Stale object" errors do not affect traffic on the wire.
Polygraph 2.7.3 probably has bugs related to the logic above
and reports stale objects when it should not. We are working on a
fix.
error(errHttpStatusCode): ``unsupported HTTP status code''
HTTP status code is a property of an HTTP response that
determines how the response should be interpreted. HTTP defines
many status codes. Polygraph robots support (and Polygraph servers
emit) several status codes (e.g., ``200 OK'' and ``304 Not
Modified''). Under normal operating conditions it is unlikely that
other status codes will reach client side of the bench.
The ``unsupported HTTP status code'' errors usually occur when
the device under the test attempts to report an unusual condition
(such as a network misconfiguration or connectivity error) back to
the Polygraph robot. To determine what that unusual condition is,
use the --dump errs option on the client side.
error(errIcpRepCode): ``unsupported ICP opcode''
ICP reply contains unsupported opcode. At the time of writing,
only three opcodes are supported: hit, miss, and
miss-no-fetch.
error(errIcpVersion): ``unsupported ICP version''
ICP message has unsupported ICP version number. Only version
2 of the protocol is supported at the time of writing.
error(errIcpMsgSize): ``bad ICP message size''
ICP message has invalid size. The size is either smaller than
2 bytes or does not match the message length
header field.
error(errIcpBadReqNum): ``bad ICP reqnum''
The reqnum field of an ICP reply is negative or is ut
of the range used by the Polygraph ICP client.
error(errIcpRepOverlap): ``ICP client may have too many outstanding requests''
Reqnum conflicts in ICP client metadata may indicate
that Polygraph cannot keep information about all pending ICP
transactions. The same error may also indicate that an ICP reply
was delivered very late, when the slot for corresponding request
was already occupied by another request.
error(errIcpUnexpMsg): ``unexpected message to an ICP agent''
An ICP server received an ICP reply or an ICP client received
an ICP request. Check your ICP ports configuration.
error(errSyncDate): ``clocks out of sync''
A Robot has detected a suspicious difference between the client
and server side clocks. Specifically, a ``first hand'' response
from a server had the Date: header more than a minute behind or
ahead of the local time (the difference is displayed after the
error message).
There are at least two possible reasons for this error
message.
Client and server side clocks are indeed out of sync.
Run date command on all hosts to check if they are in
sync.
It took the Robot more than a minute to receive
response headers after the response was issued on the server
side. If one minute response time (for small misses) is not
normal for your workload, you need to find the source of the
delay and remove/fix it. You might be overloading the proxy or
Polygraph; does the error occur when the load significantly
lighter?
If a proxy changes the value of the server's Date:
header on misses, replace the word ``server'' with ``proxy'' in
the narration above.
It is a good idea to ensure that all machines running Polygraph
processes and the device under test have synchronized
clocks. A simple way to do that is to synchronize their clocks
just before running a test using Unix ntpdate command or
similar. However, this method does not work well for tests lasting
longer than an hour because clocks may drift apart fast.
A better way to ensure clock synchronization is to run an NTP
daemon such as Unix ntpd. You can configure
all machines to synchronize with one designated "master" host. A
machine from where you start and monitor the test is a good
candidate to become a master. Synchronizing the master host with
some external host that has accurate clock is optional. Doing so
will not affect individual tests, but will make their timestamps
"correct" relative to true Earth time. Instructions on how to
use ntpd with Polygraph on FreeBSD are available
elsewhere.
error(errOther): ``unclassified error''
Self-explanatory. Usually is emitted with some extra
information to help you to diagnose the problem. Often fatal.
warning(warnPortBind): ``PortMgr failed to bind to X:Y''
bind(2) system call failed.
If you are using ephemeral port range (default), consider
either increasing that range using OS specific tools OR use
explicit port range (see the --ports option).
If you are using explicit port range, consider increasing that
range.
Occasional warnings of this kind are normal; those are probably
due to some kernel race conditions and cannot be completely
eliminated with any port mapping scheme.
warning(warnBufPoolGrew): ``buffer pool grew to N x B = S''
Polygraph warns you that it had to allocate a yet another chunk
of memory to be used for I/O buffers.
Polygraph will need more memory if there are more connections
``stalled'' in non-idle state. Check that you have enough memory
to support desired request rates. Make sure that the Polygraph
process does not page.
Not all allocated memory will be used immediately. Process
``resident size'' may grow slower than the reported buffer pool
level.
fyi(fyiAgentStart): ``agent[N] Kind starting on Host''
Just an indication that an agent (robot or server) of the
specified ``kind'' is ready to work on the specified host.
The message does not mean that a robot will start submitting
requests immediately. Launch windows and other reasons may delay
the first request.
fyi(fyiSrvScanProgr): ``server scan is probably X% completed with N out of S servers (Y%) ready to be hit''
At startup, active robots are trying to contact all servers to
make sure that any robot can re-visit a page on any server after
the scan is completed. During this time, the servers allocate new
object identifiers and report them to clients. The process is not
deterministic due to possible delays and errors. That is why
Polygraph cannot give you an exact progress indication or ETA. The
scan will continue until N is equal to S.
Note that polyclt locks current (usually the first) phase until
the scan is completed. Unfortunately, at the time of writing,
polysrv has no clue that the initial scan is going on and does not
lock the phase.
fyi(fyiSrvScanCompl): ``server scan completed with all R local robots ready to hit all S servers''
The initial server scan is complete. Polyclt will start its
normal mode of operation and unlock the current phase.
fyi(fyiMinWss): ``min `direct' objects in working set: global public: G local private: L''
Polyclt reports its current knowledge of the Working
Set Size (WSS). The counters are kept for direct objects
only. That is, embedded objects are not accounted for in this
message. ( Note, however, that the size estimation uses
average object size from the ``fill'' statistics that does include
embedded objects. )
Two classes of objects are distinguished: public and
private. Public URL space is shared among all Robots.
All polyclt processes should have similar values for
public WSS at any given time (subject to synchronization delays
among distributed polyclts.)
Private objects are specific or local to every (Robot, Server)
pair. Polyclt reports the sum of all local private
working set sizes (i.e., the sum across all robots within the
corresponding polyclt process).
How does one get the total WSS based on the FYI message above?
Here is an imprecise formula that you may use:
Objects_Per_Direct_Object = 1.6
Working_Set_Count = Objects_Per_Direct_Object * (G + N*L)
Working_Set_Size = Mean_fill_object_size * Working_Set_Count
Where Objects_Per_Direct_Object is taken from
PolyMix-2,3 workloads and may differ for other workloads.
N is the number of identically configured polyclts.
Mean_fill_object_size is usually about 11KB, but
also depends on the workload; check your stats.
At the time of writing, Report Generator does not report actual
WSS, but we are working on it.
Caching just WSS worth of data is not sufficient to achieve
perfect hit ratios because WS is not updated in an LRU,FIFO,etc.
fashion.