this post was submitted on 23 Mar 2025
304 points (100.0% liked)

Technology

67669 readers
4867 users here now

This is a most excellent place for technology news and articles.


Our Rules


  1. Follow the lemmy.world rules.
  2. Only tech related news or articles.
  3. Be excellent to each other!
  4. Mod approved content bots can post up to 10 articles per day.
  5. Threads asking for personal tech support may be deleted.
  6. Politics threads may be removed.
  7. No memes allowed as posts, OK to post as comments.
  8. Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
  9. Check for duplicates before posting, duplicates may be removed
  10. Accounts 7 days and younger will have their posts automatically removed.

Approved Bots


founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
[–] [email protected] 6 points 4 days ago (2 children)

And funded by who?

It's nice to say that it should be decentralized, but who is funding the development of that? Are you donating to IA?

[–] [email protected] 12 points 4 days ago

TBH this is an important enough resource the UN should fund it.

They won't but they should.

[–] [email protected] 3 points 4 days ago (1 children)

I mean, yeah like another user said, ideally it would be in the interest of groups which allege to have am interest in some form of democracy. But additionally, the ability to set up browsable partial mirrors which could be hosted by miscellaneous nonprofits and individuals both within and outside of the US would be a massive first step to preserving the information that IA stores. The fact that attacks on their servers can eradicate all access to the information they store is troubling given how many enemies they've made simply through the work they do.

[–] [email protected] 3 points 2 days ago (1 children)

The actual volume of data is kind of insane for distribution. You start running into many scale problems.

At ~70PB of storage, assumed redundant as well. And at ~$15/TB JUST for HDDs alone, you're talking $2.1 million in just hard drives.

Installation, hardware, and facility costs will at least pentuple that number, if we're being crazy conservative. Making the cost to stand up an archive $10.5 million?


During this process I found out that their finances are public and there is more reliable information out there:

  • $2/GB for permanent storage, overall ( $2000/TB)

The cost to store the data and run the archive is a whopping $36mill/y at the moment.

Which if you consider what they do is incredibly cheap. And easily fundable by even a small municipality never mind a large Nation.

[–] [email protected] 2 points 2 days ago* (last edited 2 days ago) (1 children)

It would be interesting to have encrypted blobs scattered around volunteer computers/servers, like a storage version of BOINC / @HOME.

People tend to have dramatically less spare storage space than space compute time though and it would need to be very redundant to be guaranteed not to lose data.

[–] [email protected] 2 points 1 day ago (1 children)

Oh for sure, that's quite reasonable, though at some point you just move towards re-creating BitTorrent, which will be the actual effect you want.

You could build an appliance on top of the protocol that enables the distributed storage, that might actually be pretty reasonable 🤔

Ofc you will need your own protocols to break the data up into manageable parts, chunked in a same way, and make it capable of being removed from the network or at least made inaccessible for dmca claims. Things that is completely preventing the internet archive from being too much of a target from government entities.

[–] [email protected] 2 points 1 day ago (1 children)

Yea some kind of fork of the torrent protocol where you can advertise "I have X amount of space to donate" and there's a mechanism to give you the most endangered bytes on the network maybe. Would need to be a lot more granular than torrents to account for the vast majority of nodes not wanting or being capable of getting to "100%".

I don't think the technical aspects are insurmountable, and there's at least some measure of a builtin audience in that a lot of people run archiveteam warrior containers/VMs. But storage is just so many orders of magnitude more expensive than letting a little cpu/bandwidth limited process run in the background. I don't know that enough people would be willing/able to donate enough to make it viable?

~70 000 data hoarders volunteering 1TB each to be a 1-1 backup of the current archive.org isn't a small number of people, and that's only to get a single parity copy. But it also isn't an outrageously large number of people.

[–] [email protected] 1 points 1 day ago

You might not necessarily have to fork BitTorrent and instead if you have your own protocol for grouping and breaking the data into manageable chunks of a particular size and each one of those represents an actual full torrent. Then you won't necessarily have to worry about completion levels on those torrents and you can rely on the protocol to do its thing.

Instead of trying to modify the protocol modify the process that you wish to use protocol with.