Re: workload questions

From: Alex Rousskov (rousskov@measurement-factory.com)
Date: Thu Feb 14 2002 - 22:44:21 MST


On Thu, 14 Feb 2002, mukesh agrawal wrote:

> 1. The function of the fill phases (framp, fill, fexit) and the
> top phases is straightforward. The website mentions that the most
> interesting measurements are taken from the top2 phase.
> Presumably, then, the other phases (inc, dec, and idle) for the
> side-effects they cause. Are they designed to induce particular
> side effects, or are they included to maintain fidelity with real
> workloads?

Both. We want to have ramp phases so that the load does not changes
too quickly. PolyMix is tough on caches, but it is not a
denial-of-service or kill-your-box-in-one-minute workload. It is also
often educational to look at how the cache responds to changes in
load.

"Dec" phases are mostly to preserve symmetry of the workload. They can
also be used to compare proxy performance at the same non-peak levels
at different test times.

The primary purpose of the idle phase is to compare proxy performance
under peak load with performance under light load: Is poor response
time a sign of overload or just a inherent feature of the product
under test?

The overall workload shape around the top1 and top2 phases also mimics
a typical daily load pattern (shortened to ~5 hours). Here is more
rant on the subject:
        http://www.web-polygraph.org/docs/workloads/polymix-2/
 
> 2. The discussion of realism mentions that zipf popularity is not
> used in the standard workloads. The reason is that the combination
> of zipf popularity with other workload characteristics yields
> unrealistic memory hit ratios. Is there a more detailed
> explanation of what the interactions are, and why this happens?

When we tried to use Zipf for the knob you are talking about, we got
violent complains from vendors who observed too high memory hit ratios
and were concerned that their competitors with inferior products will
"win" because of that deficiency of the workload. We fixed PolyMix
settings to prevent artificially hight memory hit ratios.

The exact nature/mechanisms of all the dependencies in this area are
not well understood. Somebody can make a good MS thesis or PhD paper
on the topic. We can only speculate about the reasons, and the reason
will change or even become invalid as Polygraph changes.

We have only 4-10 hours to build a working set. Real caches accumulate
days of traffic, which allows for the presence of many semi-cold
objects that are periodically accessed and that decrease memory hit
ratio. If we apply true Zipf to the part of the simulation that models
recurrence (revisits), then we will be selecting from hot and warm
objects, and get too high hit ratios.

One thing that many people miss is that the knob in question is
responsible for just one part of object selection algorithm in recent
workloads. We have other knobs that control hot subsets and embedded
object behavior, for example.

As you know, more rant about the subject can be found at
    http://www.web-polygraph.org/docs/reference/models/realism.html
 
> 3. The PGL documentation describes a Session object. Is this used
> in any of the standard workloads (grepping through the files
> suggests it is not)?

User session simulation is a relatively new feature that allows us to
simulate ON/OFF traffic sources (i.e., users becoming active or idle,
"logging" into the system or "leaving"). It is used in some Polygraph
labs, but it is no a part of PolyMix-4. However, PolyMix-4 phases use
populus factors that have somewhat similar effect (robots are
activated or killed during ramp phases).

> If not, are there plans to incorporate it into standard workloads?

Yes, we should do that.

Alex.



This archive was generated by hypermail 2b29 : Mon Feb 06 2006 - 12:00:21 MST