Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Better explanation of "I failed to train the fastspeech" #16

Open
mellogrand opened this issue Feb 21, 2020 · 1 comment
Open

Better explanation of "I failed to train the fastspeech" #16

mellogrand opened this issue Feb 21, 2020 · 1 comment

Comments

@mellogrand
Copy link

Could you elaborate a little more and maybe propose a solution to the problem you raised?

(2020/02/10)
I was able to finish this implementation by completing the Stop token prediction and remove the concatenation of inputs and outputs of multihead attention.
However, the alignments of this implementation are less diagonal, so it can not generate proper alignments for fastspeech
As a result, I failed to train the fastspeech with this implementation :(

@LEEYOONHYUNG
Copy link
Collaborator

According to the writers of the fastspeech, it is important to use proper alignments in the training.

When I implemented transformer-tts at first, I failed to implement it perfectly, and so by concatenating the input and output in multi-head-self-attention, I finished it.

I assume that thanks to this concatenation, the encoder-decoder alignments were more diagonal, and I can use around 6,000 data instances among 13,100 data instances.

However, when I correct my implementation and implement the original transformer-tts nearly perfectly, I can only use about 1,000 data instances for the fastspeech training, so audio quality is much worse than before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants