-
Notifications
You must be signed in to change notification settings - Fork 23
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
How to train a model that can fully extract the 44100hz frequency #35
Comments
Disclaimer: I'm not part of the original team, my collaborator role here is to update some documentations. I don't fully understand what you mean, but I think what you're trying to achieve here is to train models that do not have a frequency cutoff? If so, maybe take a look at their presentation slides which mentions that:
And try to change the both |
Currently I am using the below configuration to train the results without freq cutoff ↓
Although it works, and no freq cutoff, BUT the generated onnx/ckpt files are smaller than the pretrained vocals/bass/others file
So what I want to ask is:
|
For 1, from my understanding, for reduction of For 2, from the paper on MDX-Net:
So probably for having no frequency cutoff, you would want |
Thanks for your reply (double thanks^_^) Change
Error stack & source code: src/models/mdxnet.py#L33
It seems that Sorry I'm a layman in this field and don't know much about these complex things...I just want to get a correct config to train😭😭 |
Hi @dingjibang,
with a audio sample? Thank you @Zokhoi for contributions by the way. |
and I found that overlapping and broken sound were caused by too little training time, I was too impatient... After training both quickly for 10 epochs, the above problems did not exist So things seem to end very simply😭. The other parameters of the above configuration remain unchanged. Just increase Sorry for an extra question, does Thank you |
@ws-choi What is the importance to this line
for @dingjibang I think that as the harmonic series for instruments like bass are squashed in one frequency region instead of across the spectrum, having a larger |
Hi @Zokhoi, sorry I didn't notice this for a while. Below is the explaination of the onsided mode. If onesided is True (default for real input), only values for \omegaω in \left[0, 1, 2, \dots, \left\lfloor \frac{\text{n_fft}}{2} \right\rfloor + 1\right][0,1,2,…,⌊
|
I want to train a 2 stems model
I noticed that in the yaml configuration of each model, there are some parameters that will affect the final frequency cutoff, it seems that multigpu_drums.yaml can handle the full 44100hz frequency, but with the reduction of num_blocks (11 => 9), the model size will also decrease accordingly (29mb => 21mb).
Although using something like multigpu_drums.yaml can handle 44100hz in full, but the model shrinks instead. Does this affect the final accuracy?
It seems that dim_t, hop_length, overlap, num_blocks these parameters have a wonderful complementarity that I cannot understand, maybe this 'complementarity' is designed for the competition(mix to demucs), but I want to apply this to the real world without demucs(only mdx-net, after some testing, I think the potential of mdx-net is higher than demucs).
When I try to change num_blocks from 9 to 11, the results of inference have overlapping and broken voices... do you have any good parameters recommendations for me to train a full 44100hz one without loss of accuracy (i.e. the model does not Shrinking)
The text was updated successfully, but these errors were encountered: