There's just one thing that I can't find an answer to : When putting the output back in the transformer, we compute it similarly to the inputs (with added masks), so is there also a sequence size limit ?
Even BERT has an input size limit of 512 tokens, so transformers are limited in how much they can take in. So is there something to make the output length as big as wanted or is there a fixed max length ?
If I wasn't clear enough, does the network generate words infinitely until the < end > token or is there a token limit for the outputs?
Subscribe to get latest updates