this post was submitted on 29 Dec 2024
13 points (100.0% liked)

LocalLLaMA

2819 readers
69 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago
MODERATORS
 

Seems Meta have been doing some research lately, to replace the current tokenizers with new/different representations:

top 2 comments
sorted by: hot top controversial new old
[โ€“] [email protected] 2 points 3 months ago (1 children)

Does this use the same attention architecture as traditional tokenisation? As far as I understood it each token has a bunch of meaning associated with it encoded in a vector.

[โ€“] [email protected] 2 points 3 months ago* (last edited 3 months ago)

Uh, I'm not sure. I didn't have the time yet to read those papers. I suppose the Byte Latent Transformer does. It's still some kind of a transformer architecture. With the Large Concept Models, I'm not so sure. They're encoding whole sentences. And the researchers explore like 3 different (diffusion) architectures. The paper calls itself a "proof of feasibility", so it's more basic research about that approach, not one single/specific model architecture.