Seems like a bit of a stretch to call 4 seconds per frame, on a 3060, "realtime" / "as fast as you can type".
Technology
This is a most excellent place for technology news and articles.
Our Rules
- Follow the lemmy.world rules.
- Only tech related news or articles.
- Be excellent to each other!
- Mod approved content bots can post up to 10 articles per day.
- Threads asking for personal tech support may be deleted.
- Politics threads may be removed.
- No memes allowed as posts, OK to post as comments.
- Only approved bots from the list below, this includes using AI responses and summaries. To ask if your bot can be added please contact a mod.
- Check for duplicates before posting, duplicates may be removed
- Accounts 7 days and younger will have their posts automatically removed.
Approved Bots
I tried it on a 6900 XT recently and generation time was well under half a second.
Results are not as good as with SDXL but for the time it needs it's very impressive.
The author can't type very quickly
A rapid dark-tan mammalian with a bushy tail, propels itself upward off the ground, to an elevation above (or greater) than that of the canine resting below, whom has a disposition contrary to productivity.
I'd guess that the 'realtime' is a quote from StabilityAI and of course they're running that stuff on an A100. A couple of seconds is still interactive rate as generally speaking you want to think about the changes you're making to your conditioning.
Haven't tried yet but if individual steps of XL Turbo take ballpark as much time as LCM steps then... well, it's four to eight times faster. As quality generally isn't production-ready we're generally speaking about rough prompt prototyping, testing out an animation pipeline, such stuff, but that has the caveat that increasing step size often leads to markedly different results (complete change of composition, not just details) so the information you gain from those preview-quality images is limited.
Oh, "production ready quality": image quality being roughly en par with 4-step LCM means that it's nowhere near production grade. For the final render you still want to give the model more steps. OTOH I've found that some LCM-based merges do in 30 steps what other models need 80 steps for so improvements are always welcome. But I'm also worried about these distilled models being less flexible, pruning only slightly trodden paths that you actually might want the model to take.
EDIT: Addendum: I'm not seeing anything about using this stuff as a Lora. The nice thing about LCM is that you can take any model you have on your disk and turn it pretty much instantly into a model that can generate fast previews. Also, VAE decoding already can be slower than generation with LCM, so, yeah. I guess having something in between the full VAE and TAESD would be nice, TAESD is fast but is quite limited both when it comes to details, so much that you might not even be able to see what kind of texture SD generated. Oh and it also tends to get colours wrong, at least in my experience it tends to be oversaturated.
I'm on a 3060 and with 4x upscaling it takes about a second and a half.
XL Turbotastic Mega Ginormous, etc. Hate naming schemes like this. Why not just make it v2.0 or the Pro version instead? Why use multiple words that make it sound bigger and better? Marketing BS that just sounds dumb.
Why not just make it v2.0 or the Pro version instead?
"Pro version" is equally cringe.
Yeah I get that. Would just have made more sense given that it's widely used. Though I've been told why the name is so weird and it makes some sense now
I agree with you in general, but for Stable Diffusion, "2.0/2.1" was not an incremental direct improvement on "1.5" but was trained and behaves differently. XL is not a simple upgrade from 2.0, and since they say this Turbo model doesn't produce as detailed images it would be more confusing to have SDXL 2.0 that is worse but faster than base SDXL, and then presumably when there's a more direct improvement to SDXL have that be called SDXL 3.0 (but really it's version 2) etc.
It's less like Windows 95->Windows 98 and more like DOS->Windows NT.
That's not to say it all couldn't have been better named. Personally, instead of 'XL' I'd rather they start including the base resolution and something to reference whether it uses a refiner model etc.
(Note: I use Stable Diffusion but am not involved with the AI/ML community and don't fully understand the tech -- I'm not trying to claim expert knowledge this is just my interpretation)
AFAIU SDXL is actually an erm genetic descendant of SD1.5, with its architecture expanded, weights transferred from 1.5, and then trained on bigger inputs (512x512 in the end is awfully small). SD2.0 is a completely new model, trained from scratch and as far as I'm aware noone's actually using it. Also noone is using the SDXL refiner if you go to civitai it's all models with detailer capabilities baked in, what you do see is workflows that generate an image, add some noise at the very end and repeat the last couple of steps. Using the base sdxl refiner on the output of other sdxl models is sometimes right-out comical because it sometimes has no idea what it's looking at and then produced exquisitely surface texture details of the wrong material. Say a silk keyboard because it doesn't realise that it's supposed to be ABS and, well, black silk exists.
Yeah I got some good replies to my comment explaining it. Makes more sense now.
Im just glad we're moving away from purposely misspelled product SEO hacks.
This isn't free BTW folks
I haven't messed with any AI imaging stuff yet. And free recommendations to just have some fun?
Bing Image Creator if you just want to create some images quick (free, Microsoft account required). It's using DALLE3 behind the scenes, so it's pretty much state-of-the-art, but rather limited in terms of features otherwise and rather heavy on the censorship.
If you wanna generate something local on your PC with more flexibility, Automatic1111 along with one of the models from CivitAI, needs a reasonably modern graphics card and enough VRAM (8GB+) to be enjoyable and installation can be a bit fiddly (check Youtube & Co. for tutorials). But once past that you can create some pretty wild stuff.
Bing and Open AI still and free stuff. Bing’s is actually really good.
Great, even more online noise that I can look forward to.
And the resulting faces still all have lazy eyes, asymmetric features, and significantly uncanny issues.
Humans have asymmetric features. No one is symmetrical
These features are abnormally asymmetric to the point of being off-putting. General symmetry of features is a significant part of what attracts people one to another, and why facial droops from things like Bells Palsy or strokes can often be psychologically difficult for the patient who experiences them.
General symmetry, not exact symmetry.
Anecdote: I think Denzel Washington is supposed to have one of the most symmetrical faces.
I've tried to install this multiple times but always manage to fuck it up somehow. I think the guides I'm following are outdated or pointing me to one or more incompatible files.
Tough luck running any code published by people who put out models, it's research-grade software in every sense of the word. "Works on my machine" and "the source is the configuration file" kind of thing.
Get yourself comfyui, they're always very fast when it comes to supporting new stuff and the thing is generally faster and easier on VRAM than A1111. Prerequisite is a torch (the python package) enabled with CUDA (nvidia) or rocm (AMD) or whatever Intel uses. Fair warning: Getting rocm to run on not officially supported cards is an adventure in itself, I'm still on torch-1.13.1+rocm5.2 newer builds just won't work as the GPU I'm telling rocm I have so that it runs in the first place supports instructions that my actual GPU doesn't, and they started using them.
That's impressive
This is great news for people who make animations with deforum as the speed increase should make Rakile's deforumation GUI much more usable for live composition and framing.
This is the best summary I could come up with:
Stability detailed the model's inner workings in a research paper released Tuesday that focuses on the ADD technique.
One of the claimed advantages of SDXL Turbo is its similarity to Generative Adversarial Networks (GANs), especially in producing single-step image outputs.
Stability AI says that on an Nvidia A100 (a powerful AI-tuned GPU), the model can generate a 512×512 image in 207 ms, including encoding, a single de-noising step, and decoding.
This move has already been met with some criticism in the Stable Diffusion community, but Stability AI has expressed openness to commercial applications and invites interested parties to get in touch for more information.
Meanwhile, Stability AI itself has faced internal management issues, with an investor recently urging CEO Emad Mostaque to resign.
Stability AI offers a beta demonstration of SDXL Turbo's capabilities on its image-editing platform, Clipdrop.
The original article contains 553 words, the summary contains 138 words. Saved 75%. I'm a bot and I'm open source!