-
Notifications
You must be signed in to change notification settings - Fork 89
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Joining often gets stuck on waiting for metadata #1903
Comments
@leblowl Stucks for ever? |
I think so, or at least quite a while. The steps to reproduce would be to start a community in the latest develop branch and try to join it. Expected: you see the general channel very quickly after Tor connects. Actual: you get stuck on one of the last steps. |
I cannot reproduce it, eventually I am always joining |
@leblowl any more insights on how to reproduce this? is it a problem in the latest develop? |
Is it not possible to reproduce? If it's taking more than 30 seconds, or sometimes several minutes, then I think that's an issue. I haven't looked into this further, but I will soon. |
So this is a problem with misleading message. Anything below 20 minutes is not suspicious at all. What happens here you are just waiting for tor to connect to the first peer., It's known issue that tor needs a lot of time to "publish" addresses and make them dialable. |
Are we sure that's what the problem is? |
yes, if it's not stuck forever, then it's not a bug, it's just a standard behavior of tor. |
If it takes 20 minutes to connect, then I think that's a pretty big issue, for internal testing and general usability. I think we should look into it more and confirm what we think is happening and document that behavior. If it is a limitation of Tor, then that's great to know as it provides another data point to consider when talking about moving away from Tor. |
I agree that we should do this, and I don't think we know yet that it's just Tor not connecting. Just because it connects eventually does not mean we know what the problem is. Also, we should ensure that the text under the progress bar accurately reflects what is happening and what we are waiting for. If we think it's wrong, we should make an issue for that. @vinkabuki do you think the message is misleading? can you make an issue for that with steps to reproduce? |
It's always been a known behavior for tor. You can even see it with e2e tests - last week multiple client test was failing because 15 minutes timeout was not enough. Today the test passed in 8 minutes. I know that one example is not a scientific evidence but I'm just writing what we observed during 2 years working with tor. We have some general data on tor connection that we gathered (and are still gathering) on AWS however these tests are based on connecting to http server - so the old registering mechanism with registrar: https://s3.console.aws.amazon.com/s3/buckets/tor-connection-data?region=us-east-1&tab=objects |
imo 'waiting for metadata' doesn't explain what's actually happening. we used to have a good message. |
Any step where Tor has started and we are waiting for Tor to connect to an onion address should say 'connecting with Tor' |
Are there any decisions about steps to take in terms of this task? |
One very concrete thing is: every step in the joining process should be correctly described to the user, so that if a step fails or takes too long we know why. |
I guess we need specific guidelines on the descriptions then. Otherwise we'll get "lost in translation". I mean every one of us may have different understanding of what the step is about. "Waiting for metadata" is actually accurate when you think of it. |
This is how we dealt with it last time: #1277 (comment)
Since we've changed some of the steps in the process, we may need to change what is reported to the user. Also, any time we are waiting for Tor to connect, we should say "Connecting via Tor..." until Tor has connected successfully. And any time we are waiting for something else, like block data, we should not mention Tor. This way, we and our users will be on the same page about the impact of Tor on the joining process. In other words, if Tor slowness is really the culprit here, let's prove it by showing status messages that clearly state when we are waiting for Tor and when we are not. |
@siepra regarding the call we just had, I don't think there's any reason in particular why lucas needs to work on this. The workflow that @Kacper-RF did when he worked on these screens initially was:
I think this approach will work again, so I think anyone on the team can do this. @Kacper-RF might be the best person since he worked on it initially. |
Current state:
From my observations, there are few steps on which the user spends the most time and I think that these are the steps where we should provide the most valuable information: 20% - Connecting to community owner via Tor |
Thanks @Kacper-RF! It's super helpful to see this list. So, just to clarify we're talking about 2.x now. I have a few general suggestions about some of these so I'm going to go through them.
This should now say "Connecting to peers"
This happens synchronously? Why do we have to wait for this at all? We already have the ability to make outgoing connections to peers so it shouldn't block anything.
This should say "Launching community", right? Because it's in progress at this stage? Or if we have already launched the community and something else is in progress, what is in progress? It's weird to be waiting on a step that is described in the past tense as having already happened. Like, if it happened, why am I waiting?
Okay, at this point it sounds like we have already made connections to peers over Tor, in 2.x, so we aren't waiting for any more connections. What is actually happening here? Should this say "Downloading community metadata"? And if all we're doing here is downloading community metadata, how do we explain "joining often gets stuck?" Above Emi says:
But we've already made connections via Tor to some peers (at "20% - Connecting to community owner via Tor") so what is our explanation for the issue leblow is seeing? |
After taking a closer look at the current joining flow, these steps are very confusing, they are tied to the old master/production joining flow.
Previous logs are visible very briefly, it is even difficult to read So I will suggest to use only 4 steps
|
Previously we showed information about Tor's startup process. Can we show that here using the existing language, if Tor has not started yet? (Sometimes it will have started already, sometimes not.)
What percentages will you display for these steps and what will you call them? I'd propose:
Is certificates replication a step that would block other steps? I don't think it should be, so I think these steps can be enough. Is there a way to show progress on "loading messages"? |
We can do something like that: 5% - Connecting process started ( to always give user information that is on going) From 5% - 50% Tor bootstraping logs, example:
I will adjust somehow progress bar for them. 55% - Connecting to community members via Tor (long step, maybe adding 1% to the progress bar every 3 seconds, but I don't know if that's a good idea?) 80% - Loading messages (I think I can leave the same information but change the value( 80%, 85%, 90% ), after receiving events from Orbit DB, and some backend logic) We want to have replicated channels and certificates before showing the channel list to the user, because as far as I remember, some logic depends on certificates. I think I can start working on a draft-solution and send you some videos of what it looks like. Let me know what you think. |
This sounds good! |
After several different approaches, I finally did something like this: Showing Tor boostraping logs was very problematic due to the asynchronous start and sometimes broke the progress bar. I tried to limit the steps to make them clearly visible and readable to the user. I implemented an additional animation with a progress bar when user is on most consuming step - Connecting to community members via Tor The idea of adding 1% every 3 seconds was risky because sometimes the process takes a really long time if no member is online. joining-user.mp4 |
This looks great! Is that new animation implemented on mobile too? (Just asking because it seems non-standard) It might be more standard to switch the whole thing to an "infinite" progress bar that goes back and forth, if that's easier on mobile for some reason. But this looks great and, most importantly, the decisions about what to say to the user look great to me. |
Thanks ! |
* feat: basic changes * feat: better UX * feature: state manager and desktop * feat: mobile part * fix: use enum instead hardcoded string * fix: mobile channel list screen * feat: debug log * feat: trigger mobile e2e * feat: isJoiningCompleted selector * test: add isJoiningCompleted
Desktop: 2.0.3-alpha.15 Done. |
On develop, with two local nodes, this happens quite often. Sorry I haven't looked into it any more than that.
The text was updated successfully, but these errors were encountered: