Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

fee based txs/chunks selector #1833

Open
wants to merge 20 commits into
base: main
Choose a base branch
from

Conversation

tsachiherman
Copy link
Contributor

@tsachiherman tsachiherman commented Dec 13, 2024

What ?

This PR adds support for selecting the largest set of transactions / chunks based on their fees.

In this PR, I haven't included any usage of that algorithm, but rather just the algorithm itself ( + unit test )

Description

LargestSet takes a slice of dimensions and a dimensions limit, and find the
largest set of dimensions that would fit within the provided limit. The return
slice is the list of indices relative to the provided [dimensions] slice.
note that the solution of the largest set is not
deterministic ( by the nature of the problem, there could be multiple "correct"
and different results )

Algorithm ( high level )

  1. Scaling: Each dimension is scaled relative to its respective limit
    to ensure fair comparison. This step normalizes the dimensions,
    preventing larger dimensions from dominating the selection process.
  2. Vector Sizing: Each dimension is treated as a vector, and its
    magnitude is calculated. This metric is used to assess the
    relative significance of each dimension.
  3. Greedy Selection: Dimensions are iteratively selected based on
    their scaled magnitudes, adding them to the subset until the size
    limit is reached. The greedy approach prioritizes the most significant
    dimensions, maximizing the overall subset size.

Implementation notes:

  • Precision: To mitigate potential precision issues arising from
    small-scale dimensions, the function employs a scaling factor to
    amplify values before normalization. This ensures accurate calculations
    even when dealing with very small numbers.
  • Efficiency: The squared magnitude calculation is used as a proxy
    for Euclidean distance, optimizing performance without sacrificing
    accuracy for relative comparisons. This optimization avoids the
    computationally expensive square root operation.

Motivation

The motivation behind this algorithm, in contrast to a FIFO ordering is the following -

  1. When using a FIFO ordering ( more precisely, an optimistic attempt to include the transactions in the order they were received ), we are effectively using a random ordering. While this random ordering could be the ideal one, it would rarely be such.
  2. Any stable algorithm ( such as this ) would allow us to prioritize transaction volume throughput over chunk/block saturation.

Consider the following ( single dimension ) challenge:

  1. You have 5 transactions with the following weights [5,5,1,1,1]. The chunk capacity is 4.
  2. Using FIFO, you would generate the following chunks [5][5][1,1,1]
  3. Using the above algorithm, you would have generate the following [1,1,1][5][5].

This doesn't seems very meaningful as is, but imagine that you keep getting a stream of 3 transaction of weight 1 once every chunk/block ( high demand scenario ). Now, you would be generating the following:

  1. In the case of FIFO: [5],[5],[1,1,1,1,1],[1,1,1,1,1]..
  2. In case of the above algorithm: [1,1,1][1,1,1],...,[5],[5]

Hence, the network would favor TPS over chunk/block saturation. Big transaction would get pushed down the road.

@tsachiherman tsachiherman self-assigned this Dec 13, 2024
@tsachiherman tsachiherman marked this pull request as ready for review December 13, 2024 18:30
@@ -73,6 +73,18 @@ func (d Dimensions) Add(i Dimension, v uint64) error {
return nil
}

func (d Dimensions) AddDimentions(a Dimensions) (Dimensions, error) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we change the function signature to AddDimensions(a Dimensions, b Dimensions) since this is returning a new dimensions object rather than mutating to match the other functions in this package (ex. Add) ?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we fix the typo AddDimensions as well?

fees/set.go Outdated
Comment on lines 70 to 77
var err error
for i := 0; i < len(outIndices); i++ {
dim := dimensions[outIndices[i]]
if !accumulator.CanAdd(dim, limit) {
outIndices = append(outIndices[:i], outIndices[i+1:]...)
i--
continue
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we add a benchmark for this function? I'm curious the performance difference of making this O(n^2) due to append remainder of the slice

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

you're correct that it would create a O(n^2) in the worst case scenario. I'll address that..

fees/set_test.go Outdated

for i := range dimensions {
d := uint64(i)
dimensions[i] = Dimensions{d % 1000, (d + 200) % 1000, (d + 400) % 1000, (d + 600) % 1000, (d + 800) % 1000}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where is this coming from?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It creates arbitrarily set of dimensions, that are uniformly distributes and are (mostly) non-identical.
I believe that for the purpose of evaluation the overall performance of the algorithm, it would provide a reasonable test-set. We could (naturally) pick a different test set, but I wouldn't expect any material performance difference.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could we make this more obvious by using a random number modulo the desired upper bound rather than constructing it this way?

Could we also pre-generate if we're going to use the same seed and make sure to reset the timer when starting the actual benchmark?

fees/set_test.go Show resolved Hide resolved
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants