π§ NEURAL NETWORKS
BERT is just a single text diffusion step
π¬ HackerNews Buzz: 75 comments
π BUZZING
π― Text diffusion principles β’ Challenges of text diffusion β’ Diffusion vs. token-based generation
π¬ "You can't add noise to a token, you have to work in the embedding space."
β’ "It feels like it would make more sense to allow the model to do Levenshtein-like edits instead of just masking and filling in the masked tokens."