Smokeydope

joined 2 years ago
MODERATOR OF
 

Please help! I'm trying to remove this metal plate to get to my GPU for cleaning. From what I saw it had one star pattern bolt at the end of a slope connecting to the chassis, removing it loosened it a little but I don't know how to proceed. Dont want to damage anything.

Edit: I got it figured out! Had to pull a metal tab on the back as I jostled/slid the plate. Thank you!

[–] [email protected] 2 points 1 day ago

Your welcome! Its good to have a local DRM free backup of your favorite channels especially the ones you come back to for sleep aid or comfort when stressed. I would download them sooner than later while yt-dlp still works (googles been ramping up its war on adblockers and 3rd party frontends). BTW yt-dlp also works with bandcamp and soundcloud to extract audio :)

[–] [email protected] 6 points 1 day ago* (last edited 1 day ago) (2 children)

It depends on how nerdy you want to get.

Generally no, That checkmark is cloudflares multi-million dollar product that protects half of the internet from bot abuse. If there were a quick and easy work around then the bots would quickly learn and adopt it too. Sadly, it has a real penchant for triggering for regular users when you're using hardened browsers and VPNs as well. It would be nice if network admins cared a little more about sticking it to the man and using bot protection alternatives instead of just proxying through a cloudflare tunnel and calling it a day.

If you're slightly tech savvy and lucky then the site allows web crawlers/indexers or has RSS feeds. This lets you use an RSS feeder or a text scraper like NewsWaffle

What youre describing with saving a cookie to prove your identity is actually do-able, yt-dlp uses this to download age restricted content but getting that cookie extracted into a text file is a non-trivial nerdy thing the average person cant be trusted to do.

[–] [email protected] 2 points 4 days ago* (last edited 4 days ago) (2 children)

I don't have a lot of knowledge on the topic but happy to point you in good direction for reference material. I heard about tensor layer offloading first from here a few months ago. In that post is linked another to MoE expert layer offloadingI highly recommend you read through both post. MoE offloading it was based off

The gist of the Tensor Cores strategy is Instead of offloading entire layers with --gpulayers, you use --overridetensors to keep specific large tensors (particularly FFN tensors) on CPU while moving everything else to GPU.

This works because:

  • Attention tensors: Small, benefit greatly from GPU parallelization
  • FFN tensors: Large, can be efficiently processed on CPU with basic matrix multiplication

You need to figure out which cores exactly need to be offloaded for your model looking at weights and cooking up regex according to the post.

Heres an example of a kobold startup flags for doing this. The key part is the override tensors flag and the regex contained in it

python ~/koboldcpp/koboldcpp.py --threads 10 --usecublas --contextsize 40960 --flashattention --port 5000 --model ~/Downloads/MODELNAME.gguf --gpulayers 65 --quantkv 1 --overridetensors "\.[13579]\.ffn_up|\.[1-3][13579]\.ffn_up=CPU"
...
[18:44:54] CtxLimit:39294/40960, Amt:597/2048, Init:0.24s, Process:68.69s (563.34T/s), Generate:56.27s (10.61T/s), Total:124.96s

The exact specifics of how you determine which tensors for each model and the associated regex is a little beyond my knowledge but the people who wrote the tensor post did a good job trying to explain that process in detail. Hope this helps.

[–] [email protected] 1 points 4 days ago* (last edited 4 days ago)

I would recommend you get a cheap wattage meter that plugs inbetween wall outlet and PSU powering your cards for 10-15$ (the 30$ name brand kill-a-watts are overpriced and unneeded IMO). You can try to get rough approximations doing some math with your cards listed TPD specs added together but that doesn't account for motherboard, cpu, ram, drives, so on all and the real change between idle and load. With a meter you can just kind of watch the total power draw with all that stuff factored in, take note of increase and max out as your rig inferences a bit. Have the comfort of being reasonably confident in the actual numbers. Then you can plug the values in a calculation

[–] [email protected] 2 points 4 days ago* (last edited 4 days ago) (4 children)

I have not tried any models larger than very low quant qwen 32b . My personal limits for partial offloading speeds are 1 tps and the 32b models encroach on that. Once I get my vram upgraded from 8gb to 16-24gb ill test the waters with higher parameters and hit some new limits to benchmark :) I haven't tried out MoE models either, I keep hearing about them. AFAIK they're popular with people because you can do advanced partial offloading strategies between different experts to really bump the token generation. So playing around with them has been on my ml bucket list for awhile.

[–] [email protected] 2 points 4 days ago* (last edited 4 days ago) (1 children)

Oh, I LOVE to talk, so I hope you don't mind if I respond with my own wall of text :) It got really long, so I broke it up with headers.

TLDR: Bifurcation is needed because of how fitting multiple GPUs on one PCIe x16 lane works and consumer CPU PCIe lane management limits. Context offloading is still partial offloading, so you'll still get hit with the same speed penalty—with the exception of one specific advanced partial offloading inference strategy involving MoE models.

CUDA

To be clear about CUDA, it's an API optimized for software to use NVIDIA cards. When you use an NVIDIA card with Kobold or another engine, you tell it to use CUDA as an API to optimally use the GPU for compute tasks. In Kobold's case, you tell it to use cuBLAS for CUDA.

The PCIe bifurcation stuff is a separate issue when trying to run multiple GPUs on limited hardware. However, CUDA has an important place in multi-GPU setups. Using CUDA with multiple NVIDIA GPUs is the gold standard for homelabs because it's the most supported for advanced PyTorch fine-tuning, post-training, and cutting-edge academic work.

But it's not the only way to do things, especially if you just want inference on Kobold. Vulkan is a universal API that works on both NVIDIA and AMD cards, so you can actually combine them (like a 3060 and an AMD RX) to pool their VRAM. The trade-off is some speed compared to a full NVIDIA setup on CUDA/cuBLAS.

PCIe Bifurcation

Bifurcation is necessary in my case mainly because of physical PCIe port limits on the board and consumer CPU lane handling limits. Most consumer desktops only have one x16 PCIe slot on the motherboard, which typically means only one GPU-type device can fit nicely. Most CPUs only have 24 PCIe lanes, which is just enough to manage one x16 slot GPU, a network card, and some M.2 storage.

There are motherboards with multiple physical x16 PCIe slots and multiple CPU sockets for special server-class CPUs like Threadrippers with huge PCIe lane counts. These can handle all those PCIe devices directly at max speeds, but they're purpose-built server-class components that cost $1,000+ USD just for the motherboard. When you see people on homelab forums running dozens of used server-class GPUs, rest assured they have an expensive motherboard with 8+ PCIe x16 slots, two Threadripper CPUs, and lots of bifurcation. (See the bottom for parts examples.)

Information on this stuff and which motherboards support it is spotty—it's incredibly niche hobbyist territory with just a couple of forum posts to reference. To sanity check, really dig into the exact board manufacturer's spec PDF and look for mentions of PCIe features to be sure bifurcation is supported. Don't just trust internet searches. My motherboard is an MSI B450M Bazooka (I'll try to remember to get exact numbers later). It happened to have 4x4x4x4 compatibility—I didn't know any of this going in and got so lucky!

For multiple GPUs (or other PCIe devices!) to work together on a modest consumer desktop motherboard + CPU sharing a single PCIe x16, you have to:

  1. Get a motherboard that allows you to intelligently split one x16 PCIe lane address into several smaller-sized addresses in the BIOS
  2. Get a bifurcation expansion card meant for the specific splitting (4x4x4x4, 8x8, 8x4x4)
  3. Connect it all together cable-wise and figure out mounting/case modification (or live with server parts thrown together on a homelab table)

A secondary reason I'm bifurcating: the used server-class GPU I got for inferencing (Tesla P100 16GB) has no display output, and my old Ryzen CPU has no integrated graphics either. So my desktop refuses to boot with just the server card—I need at least one display-output GPU too. You won't have this problem with the 3060. In my case, I was planning a multi-GPU setup eventually anyway, so going the extra mile to figure this out was an acceptable learning premium.

Bifurcation cuts into bandwidth, but it's actually not that bad. Going from x16 to x4 only results in about 15% speed decrease, which isn't bad IMO. Did you say you're using a x1 riser though? That splits it to a sixteenth of the bandwidth—maybe I'm misunderstanding what you mean by x1.

I wouldn't obsess over multi-GPU setups too hard. You don't need to shoot for a data center at home right away, especially when you're still getting a feel for this stuff. It's a lot of planning, money, and time to get a custom homelab figured out right. Just going from Steam Deck inferencing to a single proper GPU will be night and day. I started with my decade-old ThinkPad inferencing Llama 3.1 8B at about 1 TPS, and it inspired me enough to dig out the old gaming PC sitting in the basement and squeeze every last megabyte of VRAM out of it. My 8GB 1070 Ti held me for over a year until I started doing enough professional-ish work to justify a proper multi-GPU upgrade.

Offloading Context

Offloading context is still partial offloading, so you'll hit the same speed issues. You want to use a model that leaves enough memory for context completely within your GPU VRAM. Let's say you use a quantized 8B model that's around 8GB on your 12GB card—that leaves 4GB for context, which I'd say is easily about 16k tokens. That's what most lower-parameter local models can realistically handle anyway. You could partially offload into RAM, but it's a bad idea—cutting speed to a tenth just to add context capability you don't need. If you're doing really long conversations, handling huge chunks of text, or want to use a higher-parameter model and don't care about speed, it's understandable. But once you get a taste of 15-30 TPS, going back to 1-3 TPS is... difficult.

MoE

Note that if you're dead set on partial offloading, there's a popular way to squeeze performance through Mixture of Experts (MoE) models. It's all a little advanced and nerdy for my taste, but the gist is that you can use clever partial offloading strategies with your inferencing engine. You split up the different expert layers that make up the model between RAM and VRAM to improve performance—the unused experts live in RAM while the active expert layers live in VRAM. Or something like that.

I like to talk (in case you haven't noticed). Feel free to keep the questions coming—I'm happy to help and maybe save you some headaches.

Oh, in case you want to fantasize about parts shopping for a multi-GPU server-class setup, here are some links I have saved for reference. GPUs used for ML can be fine on 8 PCI lanes (https://www.reddit.com/r/MachineLearning/comments/jp4igh/d_does_x8_lanes_instead_of_x16_lanes_worsen_rtx/)

A Threadripper Pro has 128 PCI lanes: (https://www.amazon.com/AMD-Ryzen-Threadripper-PRO-3975WX/dp/B08V5H7GPM)

You can get dual sWRX8 motherboards: (https://www.newegg.com/p/pl?N=100007625+601362102)

You can get a PCIe 4x expansion card on Amazon: (https://www.amazon.com/JMT-PCIe-Bifurcation-x4x4x4x4-Expansion-20-2mm/dp/B0C9WS3MBG)

All together, that's 256 PCI lanes per machine, as many PCIe slots as you need. At that point, all you need to figure out is power delivery.

[–] [email protected] 4 points 4 days ago* (last edited 4 days ago) (6 children)

Thanks for being that guy, good to know. Those specific numbers shown were just done tonight with DeepHermes 8b q6km (finetuned from llama 3.1 8b) with max context at 8192, in the past before I reinstalled I managed to squeeze ~10k context with the 8b by booting without a desktop enviroment. I happen to know that DeepHermes 22b iq3 (finetuned from mistral small) runs at like 3 tps partially offloaded with 4-5k context.

Deephermes 8b is the fast and efficient general model I use for general conversation, basic web search, RAG, data table formatting/basic markdown generation, simple computations with deepseek r1 distill reasoning CoT turned on.

Deephermes 22b is the local powerhouse model I use for more complex task requiring either more domain knowledge or reasoning ability. For example to help break down legacy code and boilerplate simple functions for game creation.

I have vision model + TTS pipeline for OCR scanning and narration using qwen 2.5vl 7b + outetts+wavtokenizer which I was considering trying to calculate though I need to add up both the llm tps and the audio TTS tps.

I plan to load up a stable diffusion model and see how image generation compares but the calculations will probably be slightly different.

I hear theres one or two local models floating around that work with roo-cline for the advanced tool usage, if I can find a local model in the 14b range that works with roo even if just for basic stuff it will be incredible.

Hope that helps inform you sorry if I missed something.

 

Recently I've been experimenting with Claude and feeling the burn on the premium API usage. I wanted to know how much cheaper my local llm was in terms of cost-per-token output.

Claude Sonnet is a good reference with 15$ per 1 million tokens out, so I wanted to know comparatively how many tokens 15$ worth electricity powering my rig would generate.

(These calculations are just simple raw token generation by the way, in real world theres cost in initial hardware, ongoing maintenance as parts fail, and human time to setup thats much harder to factor into the equation)

So how does one even calculate such a thing? Well, you need to know

  1. how many watts your inference rig consumes at load
  2. how many tokens on average it can generate per second while inferencing (with context relatively filled up, we want conservative estimates)
  3. cost of electric you pay on the utility bill in kilowatts-per-hour

Once you have those constants you can extrapolate how many kilowatt-hours worth of runtime 15$ in electric buys then figure out the total amount of tokens you would expect to generate over that time given the TPS.

The numbers shown in the screenshot are for a fully loaded into vram model on the ol' 1070ti 8gb. But even with partially offloaded numbers for 22-32b models at 1-3tps its still a better deal overall.

I plan to offer the calculator as a tool on my site and release it under a permissive license like gpl if anyone is interested.

[–] [email protected] 2 points 4 days ago (3 children)

No worries :) a model fully loaded onto 12gb vram on a 3060 will give you a huge boost around 15-30tps depending on the bandwidth throughput and tensor cores of the 3060. Its really such a big difference once you get a properly fit quantized model your happy with you probably won't be thinking of offloading ram again if you just want llm inferencing. Check to make sure your motherboard supports pcie burfurcation before you make any multigpu plans I got super lucky with my motherboard allowing 4x4x4x4 bifuraction for 4 GPUs potentially but I could have been screwed easy if it didnt.

[–] [email protected] 2 points 5 days ago (5 children)

I just got second PSU just for powering multiple cards on a single bifurcated pcie for a homelab type thing. A snag I hit that you might be able to learn from: PSUs need to b turned on by the motherboard befor e being able to power a GPU. You need a 15$ electrical relay board that sends power from the motherboard to the second PSU or it won't work.

Its gonna be slow as molasses partially offloaded onto regular ram no matter what its not like ddr4 vs ddr3 is that much different speed wise. It might maybe be a 10-15% increase if that. If your partially offloading and not doing some weird optimized MOE type of offloading expect 1-5token per second (really more like 2-3).

If youre doing real inferencing work and need speed then vram is king. you want to fit it all within the GPU. How much vram is the 3060 youre looking at?

[–] [email protected] 4 points 6 days ago (7 children)

Can you talk about how you made this

[–] [email protected] 2 points 6 days ago* (last edited 6 days ago)

points in a direction your mind can't understand, with the finger seeming to disappear as it bisects into a higher plane

"over there to the left after the bathrooms"

[–] [email protected] 6 points 6 days ago* (last edited 6 days ago)

Hi, electrical systems engineer with an offgrid solar system powering fans I tested with meters signing in.

The typical fans you can buy in consumer stores are about 100W on average a little less on low aroubd 80w a little more on high like 110-120w.

They make more energy efficient fans, particularly brushless motor DC powered fans meant for marine boating power systems are incredibly energy efficient and quiet but they're also incredibly expensive.

Also keep in mind consumer fans kind of suck compared to a true industrial fan which can take a lot more power for serious wind speed output which the Wikipedia for this device says improves efficiency of purification. You can get power tool industrial fans that run off dewalt tool type batteries that are low DC voltage but high amperage, they'll be more powerful than typical consumer fans too but run out of juice battery wise within hours.

I personally like the 10-15watt DC fans with pass through USBC charging for personal cooling but thats not what were talking about.

28
Homelab upgrade WIP (lemmynsfw.com)
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]
 

Theres a lot more to this stuff than I thought there would be when starting out. I spent the day familiarizing with how to take apart my pc and swap gpus .Trying to piece everything together.

Apparently in order for PC to startup right it needs a graphical driver. I thought the existance of a HDMI port on the motherboard implied the existance of onboard graphics but apparently only special CPUs have that capability. My ryzen 5 2600 doesnt. The p100 Tesla does not have graphical display capabilities. So ive hit a snag where the PC isnt starting up due to not finding a graphical interface output.

I'm going to try to run multiple GPU cards together on pcie. Hope I can mix amd Rx 580 and nvidia tesla on same board fingers crossed please work.

My motherboard thankfully supports 4x4x4x4 pcie x16 bifurcation which isa very lucky break I didnt know going into this 🙏

Strangely other configs for splitting 16x lanes like 8x8 or 8x4x4 arent in my bios for some reason? So I'm planning to get a 4x bifurcstion board and plug both cards in and hope that the amd one is recognized!

According to one source The performance loss for using 4x lanes for GPUs doing the compute i m doing is 10-15 % surprisingly tolerable actually.

I never really had to think about how pcie lanes work or how to allocate them properly before.

For now I'm using two power supplies one built into the desktop and the new 850e corsair psu. I choose this one as it should work with 2-3 GPUs while being in my price range.

Also the new 12v-2x6 port supports like 600w enough for the tesla and comes with a dual pcie split which was required for the power cable adapter for Tesla. so it all worked out nicely for a clean wire solution.

Sadly I fucked up a little. The pcie release press plastic thing on the motherboard was brittle and I fat thumbed it too hard while having problems removing the GPU initially so it snapped off. I dont know if that's something fixable. It doesnt seem to affect the security of the connection too bad fortunately. I intend to grt a pcie riser extensions cable so there won't be much force on the now slightly loosened pcieconnection. Ill have the gpu and bifurcation board layed out nicely on the homelab table while testing, get them mounted somewhere nicely once I get it all working.

I need to figure out a external GPU mount system. I see people use server racks or nut and bolt meta chassis. I could get a thin plate of copper the size of the desktops glass window as a base/heatsink?

4
Homelab upgrade WIP (lemmynsfw.com)
submitted 2 weeks ago* (last edited 2 weeks ago) by [email protected] to c/[email protected]
 

Theres a lot more to this stuff than I thought there would be when starting out. I spent the day familiarizing with how to take apart my pc and swap gpus .Trying to piece everything together.

Apparently in order for PC to startup right it needs a graphical driver. I thought the existance of a HDMI port on the motherboard implied the existance of onboard graphics but apparently only special CPUs have that capability. My ryzen 5 2600 doesnt. The p100 Tesla does not have graphical display capabilities. So ive hit a snag where the PC isnt starting up due to not finding a graphical interface output.

I'm going to try to run multiple GPU cards together on pcie. Hope I can mix amd Rx 580 and nvidia tesla on same board fingers crossed please work.

My motherboard thankfully supports 4x4x4x4 pcie x16 bifurcation which isa very lucky break I didnt know going into this 🙏

Strangely other configs for splitting 16x lanes like 8x8 or 8x4x4 arent in my bios for some reason? So I'm planning to get a 4x bifurcstion board and plug both cards in and hope that the amd one is recognized!

According to one source The performance loss for using 4x lanes for GPUs doing the compute i m doing is 10-15 % surprisingly tolerable actually.

I never really had to think about how pcie lanes work or how to allocate them properly before.

For now I'm using two power supplies one built into the desktop and the new 850e corsair psu. I choose this one as it should work with 2-3 GPUs while being in my price range.

Also the new 12v-2x6 port supports like 600w enough for the tesla and comes with a dual pcie split which was required for the power cable adapter for Tesla. so it all worked out nicely for a clean wire solution.

Sadly I fucked up a little. The pcie release press plastic thing on the motherboard was brittle and I fat thumbed it too hard while having problems removing the GPU initially so it snapped off. I dont know if that's something fixable. It doesnt seem to affect the security of the connection too bad fortunately. I intend to grt a pcie riser extensions cable so there won't be much force on the now slightly loosened pcieconnection. Ill have the gpu and bifurcation board layed out nicely on the homelab table while testing, get them mounted somewhere nicely once I get it all working.

I need to figure out a external GPU mount system. I see people use server racks or nut and bolt meta chassis. I could get a thin plate of copper the size of the desktops glass window as a base/heatsink?

 

I now do some work with computers that involves making graphics cards do computational work on a headless server. The computational work it does has nothing to do with graphics.

The name is more for consumers based off the most common use for graphics cards and why they were first made in the 90s but now they're used for all sorts of computational workloads. So what are some more fitting names for the part?

I now think of them as 'computation engines' analagous to a old car engine. Its where the computational horsepower is really generated. But how would ram make sense in this analogy?

6
submitted 3 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]
 

Bonus:

10
submitted 3 weeks ago* (last edited 3 weeks ago) by [email protected] to c/[email protected]
 

Previous tip:

I got the ti helix tip sunset color to finally complete my H2. Wow is this thing a huge improvent in several ways. Airflow is greatly increased which I immediately noticed. The m+ tip was restrictive comparatively.

The speed to heat with induction heater is faster fue to lower thermal mass of titanium and the helix being machined to be the lightest weight wise. This means the tip allows for more efficient use of energy for more heat cycles per battery charge.

I like the helical twisty swirlies. The color was purple not reddish orange so I was surprised and thought they sent the wrong thing! It seems any amount of finger oil turns it from light purple to sunset tan orange. I think that's bitching I love color changing stuff and purple is my favorite color so I was happy regardless.

The holes didnt have noticsble burrs thats what I was worried about. It was machined to within my expectations, ran it through some ultrasonic alcohol anyways before use as is good practice.

 

It seems mistral finally released their own version of a small 3.1 2503 with CoT reasoning pattern embedding. Before this the best CoT finetune of Small was DeepHermes with deepseeks r1 distill patterns. According to the technical report, mistral baked their own reasoning patterns for this one so its not just another deepseek distill finetune.

HuggingFace

Blog

Magistral technical research academic paper

 

Setting up a personal site on local hardware has been on my bucket list for along time. I finally bit he bullet and got a basic website running with apache on a Ubuntu based linux distro. I bought a domain name, linked it up to my l ip got SSL via lets encrypt for https and added some header rules until security headers and Mozilla observatory gave it a perfect score.

Am I basically in the clear? What more do I need to do to protect my site and local network? I'm so scared of hackers and shit I do not want to be an easy target.

I would like to make a page about the hardware its running on since I intend to have it be entirely ran off solar power like solar.lowtechmagazine and wanted to share technical specifics. But I heard somewhere that revealing the internal state of your server is a bad idea since it can make exploits easier to find. Am I being stupid for wanting to share details like computer model and software running it?

 

Setting up a personal site on local hardware has been on my bucket list for along time. I finally bit he bullet and got a basic website running with apache on a Ubuntu based linux distro. I bought a domain name, linked it up to my l ip got SSL via lets encrypt for https and added some header rules until security headers and Mozilla observatory gave it a perfect score.

Am I basically in the clear? What more do I need to do to protect my site and local network? I'm so scared of hackers and shit I do not want to be an easy target.

I would like to make a page about the hardware its running on since I intend to have it be entirely ran off solar power like solar.lowtechmagazine and wanted to share technical specifics. But I heard somewhere that revealing the internal state of your server is a bad idea since it can make exploits easier to find. Am I being stupid for wanting to share details like computer model and software running it?

22
submitted 1 month ago* (last edited 1 month ago) by [email protected] to c/[email protected]
 

Hello. Our community, c/localllama, has always been and continues to be a safe haven for those who wish to learn about the creation and local usage of 'artificial intelligence' machine learning models to enrich their daily lives and provide a fun hobby to dabble in. We come together to apply this new computational technology in ways that protect our privacy and build upon a collective effort to better understand how this can help humanity as an open source technology stack.

Unfortunately, we have been recieving an uptick in negative interactions by those outside our community recently. This is largely due to the current political tensions caused by our association with the popular and powerful tech companies who pioneered modern machine learning models for buisiness and profit, as well as unsavory techbro individuals who care more about money than ethics. These users of models continue to create animosity for the entire field of machine learning and all associated through their illegal stealing of private data to train base models and very real threats to disrupt the economy by destroying jobs through automation.

There are legitimate criticisms to be had. The cost in creating models, how the art they produce is devoid of the soulful touch of human creativity, and how corporations are attempting to disrupt lives for profit instead of enrich them.

I did not want to be heavy handed with censorship/mod actions prior to this post because I believe that echo chambers are bad and genuine understanding requires discussion between multiple conflicting perspectives.

However, a lot of these negative comments we receive lately aren't made in good faith with valid criticisms against the corporations or technologies used with an intimate understanding of them. No, instead its base level mud slinging by people with emotionally charged vendettas making nasty comments of no substance. Common examples are comparing models to NFTs, namecalling our community members as blind zelots for thinking models could ever be used to help people, and spreading misinformation with cherry picked unreliable sources to manipulatively exaggerate enviromental impact/resource consumption used.

While I am against echo chambers, I am also against our community being harassed and dragged down by bad actors who just don't understand what we do or how this works. You guys shouldn't have to be subjected to the same brain rot antagonism with every post made here.

So Im updating guidelines by adding some rules I intend to enforce. Im still debating whether or not to retroactively remove infringing comments from previous post, but be sure any new post and comments made will be enforced based on the following guidelines.

RULES: Rule: No harassment or personal character attacks of community members. I.E no namecalling, no generalizing entire groups of people that make up our community, no baseless personal insults.

Reason: More or less self explanatory, personal character attacks and childish mudslinging against community members are toxic.

Rule: No comparing artificial intelligence/machine learning models to cryptocurrency. I.E no comparing the usefulness of models to that of NFTs, no comparing the resource usage required to train a model is anything close to maintaining a blockchain/ mining for crypto, no implying its just a fad/bubble that will leave people with nothing of value when it burst.

Reason: This is a piss poor whataboutism argument. It claims something that is blaitantly untrue while attempting to discredit the entire field by stapling the animosity everyone has with crypto/NFT onto ML. Models already do more than cryptocurrency ever has. Models can generate text, pictures, audio. Models can view/read/hear text, pictures, and audio. Models may simulate aspects of cognitive thought patterns to attempt to speculate or reason through a given problem. Once they are trained they can be copied and locally hosted for many thousands of years which factors into initial energy cost vs power consumed over time equations.

Rule: No comparing artificial intelligence/machine learning to simple text prediction algorithms. I.E statements such as "llms are basically just simple text predictions like what your phone keyboard autocorrect uses, and they're still using the same algorithms since <over 10 years ago>.

Reason: There are grains of truth to the reductionist statement that llms rely on mathematical statistics and probability for their outputs. The same can be said for humans and the statistical patterns in our own language and how our neurons come together to predict the next word in the sentence we type out. Its the intricate complexity in the process and the way information is processed that makes all the diffence. ML models have an entire college course worth of advanced mathematics and STEM concepts to create hyperdimensional matrixes to plot the relationship of information, intricate hidden translation layers made of perceptrons connecting billions of parameters into vast abstraction mappings. There were also some major innovations and discoveries made in the 2000s which made modern model training possible that we didn't have in the early days of computing. all of that is a little more complicated than what your phones autocorrect does, and the people who make the lazy reductionist comparison just dont care about the nuances.

Rule: No implying that models are devoid of purpose or potential for enriching peoples lives.

Reason: Models are tools with great potential for helping people through the creation of accessability software for the disabled and enabling doctors to better heal the sick through advanced medical diagnostic techniques. The percieved harm models are capable of causing such as job displacement is rooted in our flawed late stage capitalist human society pressures for increased profit margins at the expense of everyone and everything.

If you have any proposals for rule additions or wording changes I will hear you out in the comments. Thank you for choosing to browse and contribute to this space.

view more: next ›