overview for RagingHungryPanda

5-minute headway isn't ideal, but it's a huge step in the right direction. I'm quite stoked for this. in c/[email protected]

[–] [email protected] 24 points 1 month ago (4 children)

I wish we had 5 minute headways haha.

database greenhorn in c/[email protected]

[–] [email protected] 1 points 1 month ago

Thanks for giving it a good read through! If you're getting on nvme ssds, you may find some of your problems just go away. The difference could be insane.

I was reading something recently about databases or disk layouts that were meant for business applications vs ones meant for reporting and one difference was that on disk they were either laid out by row vs by column.

database greenhorn in c/[email protected]

[–] [email protected] 1 points 1 month ago

Thanks haha

database greenhorn in c/[email protected]

[–] [email protected] 1 points 1 month ago (2 children)

That was a bit of a hasty write, so there's probably some issues with it, but that's the gist

database greenhorn in c/[email protected]

[–] [email protected] 1 points 1 month ago (5 children)

yes? maybe, depending on what you mean.

Let's say you're doing a job and that job will involve reading 1M records or something. Pagination means you grab N number at a time, say 1000, in multiple queries as they're being done.

Reading your post again to try and get context, it looks like you're identifying duplicates as part of a job.

I don't know what you're using to determine a duplicate, if it's structural or not, but since you're running on HDDs, it might be faster to get that information into ram and then do the job in batches and update in batches. This will also allow you to do things like writing to the DB while doing CPU processing.

BTW, your hard disks are going to be your bottleneck unless you're reaching out over the internet, so your best bet is to move that data onto an NVMe SSD. That'll blow any other suggestion I have out of the water.

BUT! there are ways to help things out. I don't know what language you're working in. I'm a dotnet dev, so I can answer some things from that perspective.

One thing you may want to do, especially if there's other traffic on this server:

use WITH (NOLOCK) so that you're not stopping other reads and write on the tables you're looking at
use pagination, either with windowing or LIMIT/SKIP to grab only a certain number of records at a time

Use a HashSet (this can work if you have record types) or some other method of equality that's property based. Many Dictionary/HashSet types can take some kind of equality comparer.

So, what you can do is asynchronously read from the disk into memory and start some kind of processing job. If this job does also not require the disk, you can do another read while you're processing. Don't do a write and a read at the same time since you're on HDDs.

This might look something like:

offset = 0, limit = 1000

task = readBatchFromDb(offset, limit)

result = await task

data = new HashSet\<YourType>(new YourTypeEqualityComparer()) // if you only care about the equality and not the data after use, you can just store the hash codes

while (!result.IsEmpty) {

offset = advance(offset)

task = readBatchFromDb(offset, limit) // start a new read batch



dataToWork = data.exclusion(result) // or something to not rework any objects

data.addRange(result)



dataToWrite = doYourThing(dataToWork)

// don't write while reading

result = await task



await writeToDb(dataToWrite) // to not read and write. There's a lost optimization on not doing any cpu work

}



// Let's say you can set up a read or write queue to keep things busy

abstract class IoJob {

public sealed class ReadJob(your args) : IoJob

{

Task\<Data> ReadTask {get;set;}

}

public sealed class WriteJob(write data) : IoJob

{

Task WriteTask {get;set;}

}

}



Task\<IoJob> executeJob(IoJob job){

switch job {

ReadJob rj => readBatchFromDb(rj.Offset, rj.Limit), // let's say this job assigns the data to the ReadJob and returns it

WriteJob wj => writeToDb(wj) // function should return the write job

}

}



Stack\<IoJob> jobs = new ();



jobs.Enqueue(new ReadJob(offset, limit));

jobs.Enqueue(new ReadJob(advance(offset), limit)); // get the second job ready to start



job = jobs.Dequeue();

do () {

// kick off the next job

if (jobs.Peek() != null) executeJob(jobs.Peek());



if (result is ReadJob rj) {



data = await rj.Task;

if (data.IsEmpty) continue;



jobs.Enqueue(new ReadJob(next stuff))



dataToWork = data.exclusion(data)

data.AddRange(data)



dataToWrite = doYourThing(dataToWork)

jobs.Enqueue(new WriteJob(dataToWrite))

}

else if (result is WriteJob wj) {

await writeToDb(wj.Data)

}



} while ((job = jobs.Dequeue()) != null)

How much cloud storage do you have? in c/[email protected]

[–] [email protected] 2 points 1 month ago

I've got Idrive backups at 5TB for like $5 a month or something.

What's up, selfhosters? It's selfhosting Sunday again! in c/[email protected]

[–] [email protected] 2 points 1 month ago

Oh that's dope. How many hours are you running? Do you also use them for things like encoding or something like that?

What's up, selfhosters? It's selfhosting Sunday again! in c/[email protected]

[–] [email protected] 5 points 1 month ago* (last edited 1 month ago)

Sweet!

What's up is everything I've been running and down is what I haven't.

not working

I haven't been able to get friendica to connect to Maria DB, so I'll eventually try just MySql. Grafana isn't running bc I would need to change a lot of things to get an exporter into each container and the truenas apps don't really allow that configuration - fine if you have docker compose though, which I've started doing more and more.

new

I just got up and running with Stirling pdf, a free (and paid) PDF editor. That looks pretty sweet.

But I'm now also using 15GB of the 32 on the system, which is still plenty for Arc cache for me

what I want

I want to rent a VPS to host various fediverse apps, probably Lemmy, pixelfed, and write freely to start, for the nomad/expect communities. I've been looking at netcup and they have some decent arm offerings.

I'd like to put Talos Linux on it so I can get some kubernetes experience. They have a good sized server for €10, so I could expand to add a DB server or one specifically for logging and metrics.

I was looking at Hetzner, but I've read that their block storage is super slow and causes timeouts on DB.

Of course, can I even run these apps on arm? I guess I gotta find that out.

One thing I'd like to do is make a web page that makes signups super easy and would create an account on all services, ideally. Not a huge deal of that isn't reasonable, but it'd be nice to allow doing it once rather than multiple times. If I could get sso, that'd be good, but I don't know how supported that is.

*Permanently Deleted* in c/[email protected]

[–] [email protected] 1 points 1 month ago

https://youtu.be/4d0Q64SQujY

I'm actually watching a video about that, complete with studies and everything.

Usuzumi no Hate (The Color of the End: Mission in the Apocalypse) Volume 4 Cover in c/[email protected]

[–] [email protected] 2 points 1 month ago

Thanks for the write up! It's definitely got my curiosity

Usuzumi no Hate (The Color of the End: Mission in the Apocalypse) Volume 4 Cover in c/[email protected]

[–] [email protected] 3 points 1 month ago (4 children)

this looks pretty cool. Have you read it? What do you think about it?

Fediverse.com is available for sale in c/[email protected]

[–] [email protected] 2 points 1 month ago

you'd probably be better off setting up your own domain server and trying to get that working

26

How I reduced the TruNas's Collabora application's nginx logging (lemm.ee)

submitted 4 months ago by [email protected] to c/[email protected]

6 comments fedilink

I previously posted about an issue where the nginx container for the Collabora application logs a GET to /robots.txt every 10 seconds. I tried modifying the files in the container, but they were reset on restart. I also tried to run the container with --log-driver=none, but was unsuccessful. Despite being a software dev, I'm new to the homelab world and trunas.

I solved it by changing the docker image and then committing those changes. The change I made was to set access_log off; in the nginx config. I did it at the server root because I don't really care about those logs for this app, but it could be done on the location level.

Here's how I did it: Here's the reference SO post that I used: https://stackoverflow.com/a/74515438

What I did was I shelled into the image:

sudo docker exec -it ix-collabora-nginx-1 bash
apt update && apt install vim
vi /etc/nginx/nginx.conf and add the access_log off;
- if you're not familiar with vim, arrow key to the line you want then press 'a' to enter "append mode". Make your change, then esc, :wq!. You need the ! because the file is read only
apt remove vim
exit
sudo docker commit <image id>
sudo docker restart ix-collabora-nginx-1

12

Help reducing logs from Collabora's nginx on Trunas docker image (lemm.ee)

submitted 4 months ago by [email protected] to c/[email protected]

2 comments fedilink

I'm running TruNas Scale with a docker image for NextCloud and Collabora. Under Collabora, the nginx application is logging a GET to robots.txt about every second and I'm having a hard time filtering this out because it looks like the conf files for nginx get replaced on every restart. I also tried mounting my own version of the nginx.conf file, but that didn't reflect any changes.

1

Ursula LeGuin Wizard Of Earthsea Graphic Novel Gets 100,000 Print Run (bleedingcool.com)

submitted 6 months ago by [email protected] to c/[email protected]

0 comments fedilink

1

Starship Velociraptor - YouTube (youtu.be)

submitted 7 months ago by [email protected] to c/[email protected]

0 comments fedilink

I came back across this homage to 80's anime

39

Mapped: Median Home Sale Price by U.S. State (www.visualcapitalist.com)

submitted 7 months ago by [email protected] to c/[email protected]

9 comments fedilink

307

Never thought I'd write these words before the last few years (lemm.ee)

submitted 8 months ago by [email protected] to c/[email protected]

3 comments fedilink

198

I got a strange, but interesting as for a full peanut production line (lemm.ee)

submitted 8 months ago by [email protected] to c/[email protected]

7 comments fedilink

I don't know if it's because I've been watching Factorio on YouTube or not, because I have not been searching for peanut butter. But anyway, so that's how you do it.

15

Visualizing the Cost of the American Dream in 2024 (lemm.ee)

submitted 8 months ago* (last edited 8 months ago) by [email protected] to c/[email protected]

1 comments fedilink

Original link: https://www.visualcapitalist.com/cost-of-the-american-dream-in-2024/

45

Visualizing the Cost of the American Dream in 2024 (www.visualcapitalist.com)

submitted 8 months ago by [email protected] to c/[email protected]

7 comments fedilink

16

More Perfect Union - We Looked Into the Demise of Local News. What We Found Will Shock You (www.youtube.com)

submitted 8 months ago by [email protected] to c/[email protected]

0 comments fedilink

Big Tech killed the news. 2.5 newspapers closed each week on average in 2023. And 500 journalists were laid off in January alone. It's because the tech giants are siphoning billions of dollars in ad revenue. Now the DOJ is finally taking them on.

18

How would you go about painting a pre-printed mini (lemm.ee)

submitted 8 months ago by [email protected] to c/[email protected]

4 comments fedilink

I see a lot of bigger minis (medis?) from shows like DBZ and I always think I'd love to paint over them.

But you can't just start painting on top, right? What do you do to prep them for repainting?

20

What Airlines Don't Want You To Know: How They Keep Prices High - More Perfect Union (www.youtube.com)

submitted 8 months ago by [email protected] to c/[email protected]

0 comments fedilink

Just four airlines control 80% of the airline industry — the most concentrated it’s ever been. And they’re using their power over consumers, and airports, to jack up prices. But this small airport in Missoula, Montana, is showing how to break the major carriers’ stranglehold.