You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
the input is (B,T) to the transformer and the output from the MLP is also (B,T) and we only use the embeddings of the last column to predict the next token why cant we do something with the embeddings of the other tokens? it's my first time learning transformers
The text was updated successfully, but these errors were encountered:
the input is (B,T) to the transformer and the output from the MLP is also (B,T) and we only use the embeddings of the last column to predict the next token why cant we do something with the embeddings of the other tokens? it's my first time learning transformers
The text was updated successfully, but these errors were encountered: