Skip to content

Commit

Permalink
incorporated comments.
Browse files Browse the repository at this point in the history
  • Loading branch information
Daniel Bauer committed Oct 25, 2024
1 parent 4686ebe commit 5f824f3
Showing 1 changed file with 13 additions and 16 deletions.
29 changes: 13 additions & 16 deletions RFC-0009/RFC-0009-binary-exchange.md
Original file line number Diff line number Diff line change
@@ -1,7 +1,4 @@
# **RFC0x for Presto**


## Replacing HTTP Exchange with Binary Exchange Protocol
# Add support for Binary Exchange Protocol

Proposers
* Daniel Bauer ([email protected])
Expand All @@ -18,7 +15,7 @@ Above protocol enhancement is integrated into the proposed binary exchange proto

The binary exchange protocol (BinX) is an alternative for the existing HTTP-based exchange protocol that
runs between Prestissimo worker nodes. It offers the same functionality and API
but uses binary encoding that can be more efficiently parsed than HTTP nessages.
but uses binary encoding that can be more efficiently parsed than HTTP messages.
This translates into a performance benefit for exchange-intensive queries.
BinX does not replace the control protocol that runs between the coordinator and the
worker nodes. The control protocol continues to use HTTP.
Expand All @@ -33,7 +30,7 @@ is more complex than decoding binary encoded messages.

### Goals

The proposal is to use a binary exchange protocol as a light-weight alternative to the existinig HTTP exchange protocol.
The proposal is to use a binary exchange protocol as a light-weight alternative to the existing HTTP exchange protocol.
As a prototypical implementation shows that such a protocol reduces query run-time of exchange heavy queries by
20% to 30%.

Expand Down Expand Up @@ -143,8 +140,9 @@ with the HTTP exchange.

#### Implementation Notes

The BinX server uses Wangle. It consists of the following components that are implemented in
the file `BinaryExchangeServer.h`:
Like Proxygen, the BinX server uses Wangle as its underlying networking library.
The BinX server is implemented in the file `BinaryExchangeServer.h` and consists of
several components:

* The `BinaryExchangeServer` is a controller for starting and stopping the Wangle protocol stack.
It takes the port number, the IO thread pool and the CPU thread pool as construction parameters.
Expand All @@ -159,9 +157,8 @@ service implementation on top of the stack.
The results from the TaskManager are packaged into replies and sent back to the requesting BinX exchange source.
This exchange service follows the design of the existing `TaskResource` service.

The `TaskManagerStub` class is an implementation detail that enables the BinX server to interact with
a mock TaskManager implementation. This is used in the unit tests and allows to test the BinX server
implementation along with the BinX exchange source implementation.
All of above components are templated to allow for different TaskManager implementations. In the production code,
the Prestissimo TaskManager is used while for unit testing, a mock task manager is deployed.

### Binary Exchange Source and Binary Exchange Client

Expand All @@ -175,7 +172,7 @@ The `PrestoServer` registers a factory method for creating exchange sources. Thi
such that `BinaryExchangeSource`s are created instead of HTTP exchanges when enabled by configuration.
One exception are connections to the
Presto coordinator that always uses the HTTP based exchange protocol. In a Kubernetes environment with its virtual
networking, it is unfortunately not straight forward to detect whether the target host is the Presto connector
networking, it is unfortunately not straight forward to detect whether the target host is the Presto coordinator
since the connector's service IP used in the Presto configuration doesn't correspond to the IP address used by the
pod running the coordinator. In order to circumvent this problem, a helper class called `CoordinatorInfoResolver`
uses the node status endpoint of the coordinator to retrieve the coordinator's IP address. Using this address
Expand Down Expand Up @@ -237,17 +234,17 @@ the additional complexity.
- There is one additional configuration option to enable BinX. Otherwise, there is no impact on session parameters, no API changes
and no changes to SQL.

- If we are changing behaviour how will we phase out the older behaviour?
- If we are changing behavior how will we phase out the older behavior?

- The HTTP stack is still required for the control message. The cost of keeping the HttpExchangeSource is minimal.

- If we need special migration tools, describe them here.

- No tools required.

- When will we remove the existing behaviour, if applicable.
- When will we remove the existing behavior, if applicable.

- Existing behaviour will remain as the default option.
- Existing behavior will remain as the default option.

- How should this feature be taught to new and existing users? Basically mention if documentation changes/new blog are needed?

Expand All @@ -261,5 +258,5 @@ the additional complexity.

Test plan involves running performance measurements using TPC-DS and TPC-H benchmarks that compare the performance of HTTP versus BinX.

The TPC-DS benchmark test has been conducted using a dataset with scale factor 1000 on an on-prem cluster with 8 nodes. The results
The TPC-DS benchmark test has been conducted using a dataset with scale factor 1000 on an on-premise cluster with 8 nodes. The results
for this 1TB dataset have shown that overall runtime for the 99 queries was ~56 minutes when using HTTP compared to ~43 minutes for BinX.

0 comments on commit 5f824f3

Please sign in to comment.