LocalLLaMA

2841 readers

13 users here now

Welcome to LocalLLaMA! Here we discuss running and developing machine learning models at home. Lets explore cutting edge open source neural network technology together.

Get support from the community! Ask questions, share prompts, discuss benchmarks, get hyped at the latest and greatest model releases! Enjoy talking about our awesome hobby.

As ambassadors of the self-hosting machine learning community, we strive to support each other and share our enthusiasm in a positive constructive way.

founded 2 years ago

MODERATORS

[email protected]

[Paper] The Era of 1-bit LLMs: All Large Language Models are in 1.58 Bits (huggingface.co)

submitted 1 year ago* (last edited 1 year ago) by [email protected] to c/[email protected]

18 comments fedilink hide all child comments

From the abstract: "Recent research, such as BitNet, is paving the way for a new era of 1-bit Large Language Models (LLMs). In this work, we introduce a 1-bit LLM variant, namely BitNet b1.58, in which every single parameter (or weight) of the LLM is ternary {-1, 0, 1}."

Would allow larger models with limited resources. However, this isn't a quantization method you can convert models to after the fact, Seems models need to be trained from scratch this way, and to this point they only went as far as 3B parameters. The paper isn't that long and seems they didn't release the models. It builds on the BitNet paper from October 2023.

"the matrix multiplication of BitNet only involves integer addition, which saves orders of energy cost for LLMs." (no floating point matrix multiplication necessary)

"1-bit LLMs have a much lower memory footprint from both a capacity and bandwidth standpoint"

Edit: Update: additional FAQ published

you are viewing a single comment's thread
view the rest of the comments

[–] [email protected] 10 points 1 year ago* (last edited 1 year ago) (1 children)

This is big if true, but we'll have to see how well it holds up at larger scales.

The size of the paper is a bit worrying but the authors are all very reputable. Several were also contributors on the retnet and kosmos2/2.5 papers.

[–] [email protected] 5 points 1 year ago* (last edited 1 year ago)

As far as I understand, their contribution is to apply what has proven to work well in the Llama architecture, to what BitNet does. And add a '0'. Maybe you just don't need that much text to explain it, just the statistics.

They claim it scales as a FP16 Llama model does... So unless their judgement/maths is wrong, it should hold up. I can't comment on that. But I'd like that if it were true...