Hello everyone! It's been a minute, and this place is still quite empty. So I just wanted to share some of my experiments here! (The exact filter and command for you to copy and paste are at the bottom of this post.)
The goal of this particular project, was to create a "robotic" overlay for organic speech by stretching the number of actual audio samples over a much larger area of time. But it works on any and all audio.
Demonstration
Using this video
Let's apply a complex filter:
ffmpeg -i coughing_baby_meme.mp4 -af afftfilt=real='hypot(re,im)*sin(0)':imag='hypot(re,im)*cos(0)':win_size=512:overlap=0.75
Here's what we end up with: https://cdn.imgchest.com/files/4z9cvjvov37.mp4
How does it work?
What we're doing here is calculating the real and imaginary portions of the frequency domain signal. You can mix and match different values if you like, to achieve different effects.
The most important part of this filter for our purposes however, is the win_size=512
portion.
Normally, your audio's samples will be stretched out across the timeline as efficiently as possible. With the physcoacoustic effect in mind, audio samples are placed at regular intervals with a set size. Now, what does changing the window_size
do anyway?.
In short, it allows you to sacrifice timeline density for sample resolution. Or vice versa.
Think of it as spreading salt on your driveway in the winter. Normally, you'd sprinkle a fair amount of salt across the pavement. You might sprinkle a little bit more in the spots where ice is particularly thick, or a little bit less where it's thinner until all of the salt is gone and your driveway covered unilaterally. Your neighbor on the other hand, dumps huge piles of salt on only a few portions running the length of the driveway. When you walk across your driveway, you don't notice anything odd because the salt has melted the ice uniformly. If you were to walk across your neighbor's driveway you might feel your feet slipping in some places, and be anchored far too firmly to the ground in others.
And this is (kind of) what creates those eerily, canned and synthetic sounding vocals.
Try It Yourself!
- make sure to encase the portions after
-af
between double quotation marks"
"
! I chose not to do this here, because it messes with the syntax highlighting on lemmy.
ffmpeg -i your_media.* -af afftfilt=real='hypot(re,im)*sin(0)':imag='hypot(re,im)*cos(0)':win_size=512:overlap=0.75 your_output_media.*
yes. and hurry up with it too