polygraph workload

From: mukesh agrawal (mukesh@cs.cmu.edu)
Date: Fri May 24 2002 - 13:44:59 MDT


Hi,

I have a question about a characteristic of the polymix-4 workload. The
fraction of the bytes requested that are due to large files seems low
compared to previously published measurements. I placed a graph comparing
the polygraph generated workload to the earlier measurements at

http://www-2.cs.cmu.edu/~mukesh/loadfrac.ps.gz

The data sets in the graph are:

Calgary
        One year's requests for the CS department websever at University of Calgary
        From Arlitt and Williamson -- Sigmetrics 96
Clarknet
        Two week's requests for a Wash, DC ISP's web server
        From Arlitt and Williamson -- Sigmetrics 96
WorldCup (busy)
        A couple hours worth of requests for the busiest day of the
        WorldCup '98 website
WorldCup (last day)
        The entire day's requests for the last day of WorldCup '98.
Berkeley HomeIP
        The four hour trace from Berkeley's HomeIP study.
Polygraph
        The workload from a Polygraph run.

The only trace that has a smaller fraction of the bytes due to files >100K
is the busy trace from the WorldCup site. Even the HomeIP trace, which I
would expect to be skewed towards small files (as the users are connected
via modems) has a larger fraction of load from large files (files >100K
comprise ~23% of the load in HomeIP, versus 10% for polygraph run).

So my question is: is the stock polymix-4 workload intended to accurately
model the fraction of the load due to large objects? If so, is the distribution
I'm seeing consistent with what is intended?

Thanks,
mukesh

-- 
public key: finger mukesh@cs.cmu.edu
fingerprint: BDAB AB7A ADFB 9229 1BD8  45FD BE21 850C E36C D4AA



This archive was generated by hypermail 2b29 : Mon Feb 06 2006 - 12:00:23 MST