Generative Artificial Intelligence

221 readers
7 users here now

Welcome to the Generative AI community on Lemmy! This is a place where you can share and discuss anything related to generative AI, which is a kind of technology that can make new things, like pictures, words, or sounds, by learning from existing things. You can post your own creations, ask for feedback, share resources, or just chat with other fans. Whether you are a beginner or an expert, you are welcome here. Please follow the Lemmy etiquette and be respectful to each other. Have fun and enjoy the magic of generative AI!

P.s. Every aspect of this community was created with AI tools, isn't that nifty.

founded 2 years ago
MODERATORS
1
 
 

cross-posted from: https://lemmy.sdf.org/post/28980041

Australia has banned DeepSeek from all government devices and systems over what it says is the security risk the Chinese artificial intelligence (AI) startup poses.

...

Growing - and familiar - concerns

Western countries have a track record of being suspicious of Chinese tech - notably telecoms firm Huawei and the social media platform, TikTok - both of which have been restricted on national security grounds.

...

An Australian science minister previously said in January that countries needed to be "very careful" about DeepSeek, citing "data and privacy" concerns.

The chatbot was removed from app stores after its privacy policy was questioned in Italy. The Italian goverment previously temporarily blocked ChatGPT over privacy concerns in March 2023.

Regulators in South Korea, Ireland and France have all begun investigations into how DeepSeek handles user data, which it stores in servers in China.

...

Generally, AI tools will analyse the prompts sent to them to improve their product.

This is true of apps such as ChatGPT and Google Gemini as much as it is DeepSeek.

All of them gather and keep information, including email addresses and dates of birth.

...

2
 
 

cross-posted from: https://lemmy.sdf.org/post/28978937

There’s an idea floating around that DeepSeek’s well-documented censorship only exists at its application layer but goes away if you run it locally (that means downloading its AI model to your computer).

But DeepSeek’s censorship is baked-in, according to a Wired investigation which found that the model is censored on both the application and training levels.

For example, a locally run version of DeepSeek revealed to Wired thanks to its reasoning feature that it should “avoid mentioning” events like the Cultural Revolution and focus only on the “positive” aspects of the Chinese Communist Party.

A quick check by TechCrunch of a locally run version of DeepSeek available via Groq also showed clear censorship: DeepSeek happily answered a question about the Kent State shootings in the U.S., but replied “I cannot answer” when asked about what happened in Tiananmen Square in 1989.

3
 
 

cross-posted from: https://lemmy.sdf.org/post/28971543

Archived

DeepSeek is said to have access to tens of thousands of GPU accelerators for the development of its own AI models, including H100 GPUs, which fall under the US export bans. The reported costs of just under 5.6 million US dollars for DeepSeek v3 probably only represent a small part of the total bill.

In the paper on the V3 model, DeepSeek writes of a comparatively small data center with 2048 H800 accelerators from Nvidia. The company calculates hypothetical rental costs of 2 US dollars per hour and H800 GPU. With a total of just under 2.8 million computing hours (distributed across 2048 GPUs), this comes to 5.6 million US dollars.

However, the developers themselves cite a caveat: "Please note that the above costs only include the official training of DeepSeek-V3 and not the costs associated with previous research and ablation experiments on architectures, algorithms or data."

...

Semianalysis has looked at a realistic cost breakdown. According to the analysts, DeepSeek has access to about 60,000 Nvidia accelerators through its parent company High-Flyer: 10,000 A100s from the Ampere generation before the US export restrictions came into effect, 10,000 H100s from the gray market, 10,000 H800s customized for China, and 30,000 H20s that Nvidia launched after more recent export restrictions.

...

Semianalysis calculates that the servers required for the 60,000 GPUs cost around 1.6 billion US dollars. The operating costs are on top of that. This does not include the salaries of the development teams.

According to DeepSeek, 96 percent of the 5.6 million US dollars quoted is for pre-training. This involves training the final underlying model. The paper ignores the previous development effort, including all the innovations incorporated into DeepSeek V2.

4
 
 

cross-posted from: https://lemmy.world/post/17926715

y2u.be/aVvkUuskmLY

Llama 3.1 (405b) seems 👍. It and Claude 3.5 sonnet are my go-to large language models. I use chat.lmsys.org. Openai may be scrambling now to release Chatgpt 5?

5
 
 

cross-posted from: https://lemmy.world/post/16792709

I'm an avid Marques fan, but for me, he didn't have to make that vid. It was just a set of comparisons. No new info. No interesting discussion. Instead he should've just shared that Wired podcast episode on his X.

I wonder if Apple is making their own large language model (llm) and it'll be released this year or next year. Or are they still musing re the cost-benefit analysis? If they think that an Apple llm won't earn that much profit, they may not make 1.

6
 
 

Hey, so first off, this is my first time dabbling with LLMs and most of the information I found myself by rummaging through githubs.

I have a fairly modest set-up, an older gaming laptop with a RTX3060 video card with 6 GB VRAM. I run inside WSL2.

I have had some success running fastchat with the vicuna 7B model, but it's extremely slow, at roughly 1 word every 2-3 seconds output, with --load-8bit, lest I get a CUDA OOM error. Starts faster at 1-2 words per second but slows to a crawl later on (I suspect it's because it also uses a bit of the 'Shared video RAM' according to the task manager). So I heard about quantization which is supposed to compress models at the cost of some accuracy. Tried ready-quantized models (compatible with the fastchat implementation) from hugginface.co, but I ran into an issue - whenever I'd ask something, the output would be repeated quite a lot. Say I'd say 'hello' and I'd get 200 'Hello!' in response. Tried quantizing a model myself with exllamav2 (using some .parquet wikitext files also from hugginface for calibration) and then using fastchat but the problem persists. Endless repeated output. It does work faster, though at the actual generation, so at least that part is going well.

Any ideas on what I'm doing wrong?

7
 
 

Language models of code (LMs) work well when the surrounding code in the vicinity of generation provides sufficient context. This is not true when it becomes necessary to use types or functionality defined in another module or library, especially those not seen during training. LMs suffer from limited awareness of such global context and end up hallucinating, e.g., using types defined in other files incorrectly. Recent work tries to overcome this issue by retrieving global information to augment the local context. However, this bloats the prompt or requires architecture modifications and additional training. Integrated development environments (IDEs) assist developers by bringing the global context at their fingertips using static analysis. We extend this assistance, enjoyed by developers, to the LMs. We propose a notion of monitors that use static analysis in the background to guide the decoding. Unlike a priori retrieval, static analysis is invoked iteratively during the entire decoding process, providing the most relevant suggestions on demand. We demonstrate the usefulness of our proposal by monitoring for type-consistent use of identifiers whenever an LM generates code for object dereference. To evaluate our approach, we curate PragmaticCode, a dataset of open-source projects with their development environments. On models of varying parameter scale, we show that monitor-guided decoding consistently improves the ability of an LM to not only generate identifiers that match the ground truth but also improves compilation rates and agreement with ground truth. We find that LMs with fewer parameters, when guided with our monitor, can outperform larger LMs. With monitor-guided decoding, SantaCoder-1.1B achieves better compilation rate and next-identifier match than the much larger text-davinci-003 model.

8
9
 
 

With minimal tweaking, just giving relatively simple prompts to these, would you say one is measurably better than the other? in what ways? or is it more of a subjective judgement.

10
1
submitted 2 years ago* (last edited 2 years ago) by [email protected] to c/[email protected]
 
 
11
 
 

Thoughts? Ideas? How do we align these systems, some food for thought; when we have these systems do chain of reasoning or various methods of logically going through problems and coming to conclusions we've found that they are telling "lies" about their method, they follow no logic even if their stated logic is coherent and makes sense.

Here's the study I'm poorly explaining, read that instead. https://arxiv.org/abs/2305.04388