Cycles X Project Update

The rendering improvements from the Cycles X project will be in the upcoming Blender 3.0 release. Since the announcement, developers have been working to complete and stabilize the code, as well add new features and improve performance.

Well give a quick overview of recent developments.

GPU Performance

GPU rendering performance has been further improved. Here’s where we stand compared to 2.93.

Render time on an NVIDIA Quadro RTX A6000 with OptiX

This is an accumulation of many incremental changes. For details, see the GPU kernel documentation and GPU performance development tasks.

At the time of the initial announcement there was no volume rendering support. Since then we have restored volume rendering, and found that GPU rendering performance improved 3-5x in various volume scenes.


Hair and Shadow Improvements

While most benchmark scenes were rendering faster with Cycles X, a few involving many layers of transparent hair were showing performance regressions compared to 2.93.

One issue we found is that in GPU rendering, if only a small subset of the whole image is slow to render (like a character’s hair) then GPU occupancy would be low. This was improved by making the algorithm to estimate the number of samples to render in one batch smarter. Previously we’d end up rendering 1 sample at a time. Now we detect low GPU occupancy and adaptively increase the number of samples to batch together, which then increases occupancy.

Another part of the solution was to change the shadow kernel scheduling. Previously, continuing to the next bounce would have to wait for all light and shadows to be resolved at the previous bounce. Now this is decoupled, and shadow tracing work for many bounces can be accumulated in a queue. This then gives a bigger number of shadow rays to trace at once, improving GPU occupancy. This matters especially when only a small amount of pixels are going 64 bounces deep into transparent hair, as in the Spring scene.

Further, we found that transparency in hair is usually quite simple, either a fixed value or a simple gradient to fade out from the root to the tip. Instead of evaluating the shader for every shadow intersection, we now bake transparency at hair curve keys and simply interpolate them. Render results are identical in all scenes we tested. Below are two sample images to compare the results.

For the statistics enthusiast, here are some memory and timing results for a few well known scenes, so that you can see the results for the transparent hair baking and the shadowing optimizations (ref is the reference without any optimizations).

Shadow Optimization Results

Distance Scrambling aka Micro-Jittering

Sobol & Progressive Multi-Jitter (PMJ) can now use distance scrambling (or micro-jittering) to improve GPU rendering performance by increasing the correlation between pixels. There is also an automatic scrambling option to automatically choose a scrambling distance value. These are available in the advanced settings in the render properties.

To render the above images the scrambling distance was set to zero to maximize the correlation between pixels. This should not be used in practice and was only done in order to make it easier to see the correlation introduced by the micro-jittering (notice the girls shoulder in the images above to the right). In a real setting you would generally have a larger distance to hide these artifacts. This technique can result in less noisy images and in some cases improved performance in the range of 1% to 5% depending on your rendering setup (it’s only beneficial for GPU rendering). Below are some performance results using the adaptive scrambling distance which currently does not work so well for CUDA due to the tile sizes. Work is currently underway to choose better tile sizes for CUDA which should result in better performance.


Ambient Occlusion

Ambient occlusion did not take into account transparency in the initial version of Cycles X. We now restored this, taking advantage of the shadow kernel improvements that also helped with hair.

Also, additive ambient occlusion (AO) support is now available through the Fast GI settings. Additionally, a new option has been added to “Add” the AO result as well as the “Replace” operation that was available already. Below are a few images to compare the results.


Denoising Improvements

We improved denoising for volumes. Previously these were mostly excluded from the albedo and normal passes used by denoisers. While there is not exact equivalent to albedo and normals on surfaces, we make an estimate. This can significantly help the denoiser to denoise volume detail.

The denoising depth pass has also been restored, which was previously removed along with NLM.


AMD HIP

We’ve worked with AMD to bring back AMD GPU rendering support. This is based on the HIP platform. In Blender 3.0, it is planned to be supported on Windows with RDNA and RDNA2 generation discrete graphics card. It includes Radeon RX 5000 and RX 6000 series GPUs.

We are working with AMD to add support for Linux and investigate earlier generation graphics cards, for the Blender 3.1 release. While we would have liked to support more in 3.0, HIP for GPU producing rendering is still very new.

However we think it is the right choice going forward. It lets us share the same GPU rendering kernels and features with CUDA and OptiX, whereas previously the OpenCL implementation was always lagging behind and had more limitations and bugs.

To test the HIP release you need to get the Blender 3.1 alpha and also to download the latest AMD drivers (See this blog post for more information.)

Blender 2.93 vs Blender 3.0 on AMD

Apple Metal

We also recently announced a collaboration with Apple. They are contributing a Metal backend for Cycles , planned for Blender 3.1.


Future Work

Now that the new Cycles X architecture is in place, we expect that adding various new production features will be easier. This will start in 3.1 and continue through the 3.x series.

Download the latest Blender 3.1 Alpha builds to try out the new features.

43 comments 13,112 Views
  1. Is there any possibility that blender will utilize slightly older intel dedicated graphic cards in the future?

  2. Hi there, you guys do a great job and I thank you for it, but I was wondering if in the future blender will have a cad section so you can model based on measurements.
    Thanks

  3. OK, now lets wait for 10 years and then see if AMD is still committed to HIP and if it can actually be relied upon for production. Until then outside of testing I’m probably not very likely to use it.

  4. What CPU config is being used in the CPU tests?

    RTX A6000 is a top of the line GPU. Is it going against a 5 year old 4-core CPU or a dual 64-core AMD Epyc system?

  5. Is there any improvement in and cpu rendering or built in graphics card on laptops…

    • There have been various improvements to the cycles rendering engine such as the PMJ sampler and the hair improvements which should improve rendering on CPU also. As for the laptop GPU it depends on what you have. AMD is working hard to support as many cards as possible or if you have an Nvidia card that supports a recent CUDA version it should work also.

  6. Very cool! Any ideas when we’ll be seeing improvements to the Shadow Catcher in Cycles X?

    • Many improvements to the shadow catcher have been committed and there are more to come.

  7. I wish Blender had more controls for texture or geometry compression. When rendering a complex scene on the GPU, it’s often possible to exceed available VRAM, and there are no tools available to get a quick memory usage breakdown, and how to deal with that. You can spend a lot of time debugging your scene.

    Also, you don’t know that you exceeded available VRAM until the rendering process has already spent a couple of minutes processing the scene.

    This can also be important, if you intend to render your scene on a different GPU with less VRAM than the one you are currently using.

    I know Cycles isn’t entirely at fault here, but there would be *many* productivity speedups from having such tools.

    • Yes I agree, this was something I was looking into earlier using compression or tiling to control the amount of RAM used for textures. Unfortunately, I don’t have a time frame for this, if I remember correctly I think someone started a patch but I can’t seem to find it now πŸ™

  8. I would love to see tiles brought back, as I am unable to render high resolution images anymore. IΒ΄ve tried rendering the same scenes with the old version of blender (tile rendering) and they render just fine however, when I try the beta (progressive rendering) I cannot set my resoltion higher than 6000px before it overloads my vRAM on my rtx3060.

  9. Hello William, thanks for the update, any chance you’ve got a link that explains scrambling distance ? I have no idea what “correlation between pixels” means

  10. Amazing job, it’s so good to have a an open source software evolving like this ! Thanks a lot ! πŸ™Œβ€οΈ

  11. In an upcoming version, I hope to have the same things as UDM and Nanite. They’re the new standard for efficiently running large and detailed scenes.

  12. Hello William, thanks for all nice improvements

    I have an inquire, i have made in today the download of the RC version and in testing the known BMW27 scene i have this results
    Optix RTX3060+5800H =13.8 seconds
    Optix RTX3600 only = 12.8 sec

    Why the R5800H is detrimental to the rendering speed? I would expect it would help something not the inverse.

    PS: i am willing to help test anything you need.

    • Unfortunately multi-device support is still a work in progress. I believe that in this case the optimal tile sizes for each device differ and this is causing a slow down as it causes the program to wait on the slower device on each iteration.

      Thank you for offering to help, all help is greatly appreciated. If you go to blender.chat you can link up with the developers to offer your help for more details check out here.

  13. That performance and development pace!

  14. I wonder how devs take this “i want lumen / nanite ” thing

  15. Great work and I’m looking forward for AMD GPU support on Linux.

  16. Put mesh shader on blender like UE5 As nanite πŸ™‚

  17. Really excited to test out my 5700xt! Thank you devs for showing amd users some love! Fingers crossed for simulated volume rendering on the gpu (smoke sims don’t show up in cycles on my card as of 2.93)

  18. Pretty cool!!! Great rock-solid work, Cycles-X is already fantasic!

  19. will branched path tracing be added back, this was a huge feature for optimising renders

  20. Good work ! Couldn’t get back to 2.93 since i started using 3.0 Alpha, the new reactivity in the viewport is a big deal + the general speed up

  21. Fantastic Work by Brecht and William (sorry if I’m misspelling their name) and community devs.

    Thx so much for thx awesome piece of software !

    I wish I had something like Blender when I was young (when 3D software used to cost in the thousand).

    Cheers !

  22. I really hope that Blender 3.1 will support AMD RX5XX Polaris Series.
    Although not as fast as Nvidia GPUs, at least AMD GPU users don’t need to buy GPUs anymore.

    • I do believe AMD is actively working to support as many cards as possible so hopefully you’ll get your wish.

    • Especially important to have as wide GPU support as possible at the moment, given how absurdly expensive it is to buy a new GPU with the ongoing chip shortages. This would be the worst possible timing to impose a GPU upgrade on a user.

    • In a very small amount of time to directly translate the cuda code to ensure that it does not crash is good, want to improve performance enough also need to remove the cuda code stock, open hardware ray tracing, but hip early many bugs and light errors are not like cuda, but closer to optix, unbelievable ah

  23. Great work as always guys, lovely to finally see AMD supported again, even though I don’t currently own any.

    Hoping to eventually see this pave the way for a better shadow catcher implementation, and many-light-sampling optimizations.

    Blender devs are awesome!

    • The shadow catcher in 3.0 is much better. Supports a full color shadow pass and denoising. Read the release logs!

  24. It says that Apple joined as a patron level supporter. But yet, it does not show up on the dev fund page. Is there any reason for this?

  25. Hello when will we have Path guiding with caustics?

  26. Cycles X is a game changer. Everything got better in 3.0. Blender devs are incredible and we are so grateful! πŸ™‚

  27. πŸ”₯πŸ™πŸ”₯ got nothing to say but thank you guys for making ma carrier possible ☺️

  1. Leave a Reply

    Your email address will not be published. Required fields are marked *