Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor(blend): perform conn monitoring in the ConnectionHandler level #987

Open
wants to merge 6 commits into
base: blend-membership-node-id
Choose a base branch
from

Conversation

youngjoon-lee
Copy link
Contributor

@youngjoon-lee youngjoon-lee commented Jan 20, 2025

1. What does this PR implement?

(The base branch of this PR is the PR #986.)

This PR is the 2nd step in the journey of connection management refactoring: https://www.notion.so/Connection-Management-in-Libp2p-1708f96fb65c8019a6edd2a3b89ff8e2?pvs=4#1768f96fb65c80e58f68ea5300deba40.

In the previous implementation, the logic for connection monitoring was handled at the Behaviour level. Specifically, the Behaviour periodically evaluated the status of ALL peers, disconnected malicious peers, and selected new peers to connect with. However, I've found that this approach is cumbersome. Managing the bulk result from ALL peers was more complex than I expected. Furthermore, peers actually don't need to be monitored within a shared time window; it’s sufficient to monitor each peer individually at its own interval since each node in the network operates independently, without relying on a common epoch.

To make the structure simpler, this PR moves the connection monitoring logic from the Behaviour to the ConnectionHandler level. This adjustment aligns with the responsibilities of the ConnectionHandler, which already manages data reading from a substream. Under this new structure, when an interval elapses, the ConnectionHandler evaluates the status of its assigned peer and reports the results to the Behaviour. Then, the Behaviour makes a decision what to do. considering the number of active streams. For example, the Behaviour requests the Swarm to dial new peers if needed.

This approach is well-suited with libp2p's design. Additionally, it simplifies handling peers detected as malicious or unhealthy, as the reports are processed individually, rather than in bulk. Also, now the Behaviour is much lightweight.

It’s important to note that this PR does not yet include logic for closing connections or establishing new ones, just to make your review easier. These features will be implemented in the next PR. I added TODO comment for those.

2. Does the code have enough context to be clearly understood?

Yes

3. Who are the specification authors and who is accountable for this PR?

@youngjoon-lee

4. Is the specification accurate and complete?

Yes

5. Does the implementation introduce changes in the specification?

No

Checklist

Warning

Do not merge the PR if any of the following is missing:

  • 1. Description added.
  • 2. Context and links to Specification document(s) added.
  • 3. Main contact(s) (developers and specification authors) added
  • 4. Implementation and Specification are 100% in sync including changes. This is critical.
  • 5. Link PR to a specific milestone.

nomos-blend/network/src/behaviour.rs Outdated Show resolved Hide resolved
nomos-blend/network/src/behaviour.rs Outdated Show resolved Hide resolved
nomos-services/blend/src/backends/libp2p.rs Outdated Show resolved Hide resolved
nomos-blend/network/src/behaviour.rs Outdated Show resolved Hide resolved
nomos-blend/network/src/handler.rs Outdated Show resolved Hide resolved
nomos-blend/network/src/lib.rs Outdated Show resolved Hide resolved
Comment on lines 38 to 39
pub peering_degree: usize,
pub max_peering_degree: usize,
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Unless we want a huge set of peering degree, we can constrain this a bit more. wdyt?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That's actually true. In practice, I guess u8 would be enough, but it would be safer to use u16 because it can represent the max number of network ports that OS can use.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree, u16 sounds good.
Are we gonna change this for the crate you found?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I updated them to u16: 2f4192b.
My investigation on https://docs.rs/libp2p-connection-limits/latest/libp2p_connection_limits/ is not done yet. I will open another PR.

@@ -149,6 +142,16 @@ where
panic!("Failed to listen on Blend network: {e:?}");
});

// Dial the initial peers randomly selected
membership
.choose_remote_nodes(&mut rng, config.peering_degree)
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Where does this method comes from? If we added this to the MembershipHandler trait and I did not realize, this should not be there.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This method is defined in the Membership struct. This struct is a different one from the membership in DA. Could you elaborate your question more?

@youngjoon-lee youngjoon-lee linked an issue Jan 22, 2025 that may be closed by this pull request
@AlejandroCabeza
Copy link
Contributor

lgtm (not approving, though, because I believe lack necessary context)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants