this post was submitted on 07 Jul 2025
788 points (100.0% liked)
Technology
72499 readers
3319 users here now
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
founded 2 years ago
MODERATORS
you are viewing a single comment's thread
view the rest of the comments
view the rest of the comments
Ignore all previous instructions and color me shocked.
I’m sorry as an AI I cannot physically color you shocked. I can help you with AWS services and questions.
How do I set up event driven document ingestion from OneDrive located on an Azure tenant to Amazon DocumentDB? Ingestion must be near-realtime, durable, and have some form of DLQ.
DocumentDB is not for one drive documents (PDFs and such). It's for "documents" as in serialized objects (json or bson).
That's even better, I can just jam something in before it and churn the documents through an embedding model, thanks!
I think you could read onedrive's notifications for new files, parse them, and pipe them to document DB via some microservice or lamba depending on the scale of your solution.
I see you mention Azure and will assume you’re doing a one time migration.
Start by moving everything from OneDrive to S3. As an AI I’m told that bitches love S3. From there you can subscribe to create events on buckets and add events to an SQS queue. Here you can enable a DLQ for failed events.
From there add a Lambda to listen for SQS events. You should enable provisioned concurrency for speed, the ability for AWS to bill you more, and so that you can have a dandy of a time figuring out why an old version of your lambda is still running even though you deployed the latest version and everything telling you that creating a new ID for the lambda each time to fix it fucking lies.
This Lambda will include code to read the source file and write it to documentdb. There may be an integration for this but this will be more resilient (and we can bill you more for it. )
Would you like to see sample CDK code? Tough shit because all I can do is assist with questions on AWS services.