Skip to content

Sporadic Applications

Delta edited this page Oct 18, 2023 · 1 revision

BOINC was originally designed as a batch processing system: you submit jobs, they run (independently of each other) and eventually finish. But some potential uses of volunteer computing don't fit this model. They may require that apps run concurrently on different computers, and perhaps that they communicate directly with each other. Examples include MPI-type parallel computing and distributed machine learning. BOINC's 'sporadic application' mechanism is designed to support these types of systems, and to allow them to coexist with batch processing.

In this scheme there is a distributed system - let's call it a 'guest system' - that exists outside of BOINC. The guest system typically has its own server that handles requests and dispatches them to 'worker nodes' running the BOINC client. The guest system's worker part runs as a sporadic app on these nodes. Instances may communicate directly with each other - peer-to-peer or via a relay - as well as with the server.

The guest system uses BOINC to

  • Securely distribute and run its worker code (the sporadic app).
  • Enforce volunteer computing preferences (when to compute, how many CPUs to use, etc.)
  • For volunteers already using BOINC, to divide computing power among projects.

The guest system doesn't use BOINC's batch processing features.

The jobs of a sporadic app run (i.e. are present in memory) all the time, like non-CPU-intensive jobs, but they compute only some of the time. Like regular apps, a sporadic app can have multiple app versions. Each of these has a plan class, which determines the processor usage (CPUs and GPUs) of its jobs. A project's BOINC scheduler can send multiple jobs for a given sporadic app, using the same or different app versions.

A BOINC project can provide any combination of regular, sporadic, and non-CPU-intensive apps. A client can be connected to multiple projects with sporadic apps.

Like regular jobs, a sporadic job can compute only when BOINC allows it to, i.e.:

  • computing (and GPU computing if relevant) is enabled by user preferences
  • there are sufficient processing resources and RAM

In addition, a sporadic job computes only when the guest system asks it to. Thus, a sporadic job converses with both the BOINC client and with the guest system server; it computes only when the server asks it to, and when the client says it's OK to.

The API for sporadic apps

The protocol between the BOINC client and a sporadic app uses the following messages:

Client to app:

DONT_COMPUTE: the app can't compute now (e.g. because resources are not available)

COULD_COMPUTE: the app could potentially compute

COMPUTING: the app is computing as far as the client is concerned

App to client:

DONT_WANT_COMPUTE: the app doesn't want to compute now

WANT_COMPUTE: the app wants to compute

The protocol between the app and the guest server isn't specified. It could be based on polling from the app, or bidirectional requests.

A typical scenario is as follows:

sequenceDiagram
    participant C as BOINC client
    participant A as sporadic job
    participant S as guest system server
    A->>C: DONT_WANT_COMPUTE
    A->>S: I cannot compute
    C->>A: DONT_COMPUTE
    C->>A: COULD_COMPUTE
    A->>S: I can compute
    S->>A: here is a request
    A->>C: WANT_COMPUTE
    C->>A: COMPUTING
    A->>S: request confirmed, computing
    loop Compute
        A->>A: check for DONT_COMPUTE from client
    end
    A->>S: I am done computing
    A->>C: DONT_WANT_COMPUTE
    C->>A: COULD_COMPUTE
Loading

The steps are:

  • Initially the client tells the app that it can't compute, perhaps because the user has suspended computation.
  • The app relays this to the server; this tells the server not to send any requests. The server can keep track of which worker nodes are available for computing at a given point.
  • Eventually the user enables computing; the client relays this as a COULD_COMPUTE message to the app, and the app relays it to the server, indicating that it can now accept requests.
  • The server sends a request to the app, asking it to do some computing (and possibly some network communication with other workers).
  • The app sends WANT_COMPUTE to the client.
  • The client reserves that needed computing resources and sends COMPUTING to the app
  • The app computes. When it's done, it sends DONT_WANT_COMPUTE to the client.
  • The client (assuming computing is not suspended) sends COULD_COMPUTE

It's also possible that the app must stop computing before the request is finished - for example, because the user suspends computing. In this case:

  • The client sends DONT_COMPUTE to the app
  • The app notifies the server that it can't finish the request (or it might wait before doing this, in case computing is re-enabled quickly).

Thus, the app must continuously check for message from the client, even while it's computing.

The API interfaces for communicating with the client are:

enum SPORADIC_CA_STATE {
    CA_NONE             = 0,
    CA_DONT_COMPUTE     = 1,
    CA_COULD_COMPUTE    = 2,
    CA_COMPUTING        = 3
};

enum SPORADIC_AC_STATE {
    AC_NONE                 = 0,
    AC_DONT_WANT_COMPUTE    = 1,
    AC_WANT_COMPUTE         = 2
};

extern void boinc_sporadic_set_ac_state(SPORADIC_AC_STATE);
extern SPORADIC_CA_STATE boinc_sporadic_get_ca_state();

Network communication

Sporadic apps that do network communication should obey the rules for accessing the network. They should, for example, not communicate when user preferences forbid it.

Prioritization of sporadic apps

In the initial implementation of sporadic apps (BOINC client version 7.26), sporadic apps have strict priority over regular apps. Thus if a sporadic app does lots of computing it can starve regular app. If multiple sporadic apps compete for a resource (say, a GPU) the prioritization among them is fixed; one can starve the others.

In a later version, sporadic apps will be scheduled using the same scheme that is used for regular apps, in which project resource share determines prioritization and starvation is eliminated.

Clone this wiki locally