-
Notifications
You must be signed in to change notification settings - Fork 422
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
RATIS-998: shouldWithholdVotes() should be triggered for handling higher term #144
base: master
Are you sure you want to change the base?
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@GlenGeng Thanks for working on this! I think this works as expected. withhold vote should be false if server term is less than candidate term and would be true if server term >= candidateTerm.
@szetszwo Hello Nicholas, could you please have a quick look for this PR ? I consider that:
Since the change is critical, could you please double confirm. Thanks! |
@@ -785,7 +785,7 @@ public RaftClientReply setConfiguration(SetConfigurationRequest request) throws | |||
} | |||
|
|||
private boolean shouldWithholdVotes(long candidateTerm) { | |||
if (state.getCurrentTerm() < candidateTerm) { | |||
if (state.getCurrentTerm() >= candidateTerm) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What is the reason behind the change? It seems to contradict the raft paper.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It does obey the paper.
4.2.3 Disruptive servers
Without additional mechanism, servers not in Cnew can disrupt the cluster. Once the cluster leader has created the Cnew entry, a server that is not in Cnew will no longer receive heartbeats, so it will time out and start new elections. Furthermore, it will not receive the Cnew entry or learn of that entry’s commitment, so it will not know that it has been removed from the cluster. The server will send RequestVote RPCs with new term numbers, and this will cause the current leader to revert to follower state. A new leader from Cnew will eventually be elected, but the disruptive server will time out again and the process will repeat, resulting in poor availability. If multiple servers have been removed from the cluster, the situation could degrade further.
Because of this scenario, we now believe that no solution based on comparing logs alone (such as the Pre-Vote check) will be sufficient to tell if an election will be disruptive.
Raft’s solution uses heartbeats to determine when a valid leader exists. In Raft, a leader is
considered active if it is able to maintain heartbeats to its followers (otherwise, another server will start an election). Thus, servers should not be able to disrupt a leader whose cluster is receiving heartbeats. We modify the RequestVote RPC to achieve this: if a server receives a RequestVote request within the minimum election timeout of hearing from a current leader, it does not update its term or grant its vote. It can either drop the request, reply with a vote denial, or delay the request; the result is essentially the same. This does not affect normal elections, where each server waits at least a minimum election timeout before starting an election. However, it helps avoid disruptions from servers not in Cnew: while a leader is able to get heartbeats to its cluster, it will not be deposed by larger term numbers.
BTW, shouldWithholdVotes
is redundant now, the cases handled by shouldWithholdVotes()
can also handled by recognizeCandidate()
. if currentTerm is larger than candidateTerm, just reject the RequestVote request, no further handling is needed.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If (state.getCurrentTerm() >= candidateTerm), the server should not vote it. Why returning false?
Honestly, I don't understand what you are trying to fix. Could you add a unit test to show it?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What changes were proposed in this pull request?
shouldWithholdVotes() should be triggered for handling higher term
What is the link to the Apache JIRA
https://issues.apache.org/jira/projects/RATIS/issues/RATIS-998?filter=allissues
How was this patch tested?
CI