-
Notifications
You must be signed in to change notification settings - Fork 0
ClientSim
The BOINC client emulator (BCE) simulates a single BOINC client interacting with one or more projects. BCE uses the same source code as the client for the CPU scheduling and work-fetch policies, so it models the BOINC client accurately.
The intended uses of BCE include:
- Identifying scenarios (combinations of host and project characteristics) where the current scheduling policies don't behave well.
- Studying experimental policies.
However, BCE is not necessarily perfect - in some cases its results may differ significantly from what the actual client would do. Or its inputs may be inadequate to describe a real-life scenario. If you find such cases, please send email to David Anderson.
You can use BCE in either of two ways:
- Through a web interface. This lets you do one simulation at a time, and shows you results graphically.
- Compile it yourself and run from a command line. This provides a more flexible interface.
The input consists of the following files:
This describes a set of attached projects. The format is an extension of the state file generated by the client; you can use the state file of a running client as an input to the simulator.
The fields used by the simulator are as follows (fields marked with * are not generated by the client).
host_info
p_ncpus
p_fpops
m_nbytes
coprocs
These describe the hosts's processing hardware. The simulator doesn't model disk usage.
time_stats
on_frac
connected_frac
active_frac
gpu_active_frac
*on_lambda
*connected_lambda
*active_lambda
*gpu_active_lambda
These describe the host's availability:
- on_frac: the fraction of total time this host runs the client
- connected_frac: of the time this host runs the client, the fraction it is connected to the Internet.
- active_frac: of the time this host runs the client, the fraction it is enabled to use CPU
- gpu_active_frac: of the time this host runs the client, the fraction it is enabled to use GPU (always <= active_frac).
For periods of activity and inactivity are exponentially distributed. The mean of the activity periods can be specified with on_lambda etc.; the default is 1 hour.
project
project_name
resource_share
*available
frac
lambda
app
name
*latency_bound
*fpops_est
*fpops_actual
mean
stddev
*weight
*max_concurrent
app_version
app_name
avg_ncpus
flops
plan_class
coproc
type
count
gpu_ram
*working_set
workunit
app_name
rsc_fpops_est
rsc_fpops_bound
result
name
report_deadline
received_time
active_task
result_name
working_set_size
Notes: Each application has a fixed latency bound. It can be specified in app.latency_bound. If not, and there is a result for that app, it is computed as report_deadline - received time for one such result. If there is no result, it is 1 week.
An application has a fixed FLOP count estimate. It can be specified as app.fpops_est. If not, and there is a WU for that app, it is wu.rsc_fpops_est. Otherwise it is 3600*1e9 (i.e., 1 GFLOPS/hr).
An application has a normal distribution of actual FLOP count. It can be specified as app.fpops_actual. Otherwise it is mean app.fpops_est, stddev 0.
An application has an associated weight that determines the fraction of its jobs dispatched by that project. This defaults to 1.
An application version has a fixed working set size. This can be specified as app_version.working_set. If not, and there is an active task for that app version, active_task.working_set_size is used. Otherwise it defaults to 0.
The availability of the projects (i.e. the periods when scheduler RPCs succeed) is modeled with two parameters: the duration of available periods are exponentially distributed with the given mean, and the unavailable periods are exponentially distributed achieving the given available fraction. The availability of a project can be specified as project.available; otherwise it is always available.
The algorithm for simulating a scheduler RPC to project P is:
while need more work
X = list of P's apps with versions for requested resources
if X is empty
break
choose an app A from X, randomly based on weights
V = version that uses requested resources and has highest FLOPS
J = generate job
if J is feasible
update request
else
infeasible_count++
if infeasible_count == 10
break
The available periods (i.e., when BOINC is running) and the idle periods (i.e. when there is no user input) are modeled as above.
format described here.
format described here.
The simulator can be built with 'makefile_sim' on Unix or the 'sim' project on Windows. The usage is:
sim [--duration X] [--delta X] [--server_uses_workload] [--dirs d1 ...]
simulate this much time.
time step of simulation.
servers take existing workload into account when deciding whether to send jobs.
Duration correction factor (DCF) is one.
Use formula for DCF based on completion time mean/stdev.
chdir into each of the given directories, and runs a simulation based on the input files there. Prints summaries of each one separately, and a total summary.
The simulator creates several output files:
index.html: an index of other files
log.txt: This is the message log (same as would be generated by the client). Its contents are controlled by cc_config.xml.
time_line.html: When viewed in a web browser, a 'time line' showing what's running when.
summary.xml: Contains four performance metrics:
Of the total CPU time, the fraction spent computing results that missed their deadline.
Of the total CPU time, the fraction spent not computing.
A measure (0 to 1) of how badly resource shares were violated.
A measure (0 to 1) of how long a single project used all CPUs (so that user would see only that project on their screensaver, and get bored).
In addition, information is printed about the per-project CPU time and waste.