Announcement

**Karol.Osinski** · 12-08-2020, 02:47 AM

A lot of interesting information in this thread guys

I am more on Joelaff side when it comes VRay settings accesibility, usually as Muhammed_Hamed everything works out of the box but from time to time you encounter the case where you want to have control over all the knobs. I am facing some slow render time problems currently (for anyone interested I posted it here https://forums.chaosgroup.com/forum/...ow-render-time
Back when I used 3ds max I love the idea of having Normal / Advanced /Expert subcategories in the settings window, and yes...it is user problem when he switch to expert without knowing what they are doing (I went through it myself and I don't regret it).

Originally posted by Muhammed_Hamed View Post

The GPU IPR starts instantly for me similar to Octane or Fstorm, you guys need to follow up with support to see what is causing the slowdowns in your scenes.
It is important to follow along and make sure the issues are solved,
I have been using GPU for most projects since 2017.. there are issues, but I think it is in a pretty good spot right now, it is fair to say it is production ready at least for what we do

I believe it could be an issue with high resolution textures being reloaded into VRAM everytime you launch the render, I gotta find out. And okay...I agree that these days it is totally possible to do even heavy Archviz with GPU, though difficult as it requires the whole team to be very attentive when collaborating on the same big, project - something that I am slowly trying to introduce in our studio so maybe within 2 years we can go full GPU.

When working on personal projects I go GPU 90% of the time (I only have i7700HQ + GTX 1060 in a laptop so speed gain is very big).

**francomanko** · 12-08-2020, 04:42 AM

I have to say,, im only doing small residential renders, admittedly with proxy trees, but the IPR on gpu just grinds to a halt...on a single 2080ti...the startup time is pretty poor,constantly 'building gpu dynamics' and absolutely not starting instantly. Ive attached a video to show the speed when i move an object....and its generally a lot slower than whats shown.

Attached Files

gpu dynamic.zip (1.45 MB, 136 views)

**Muhammed_Hamed** · 12-08-2020, 05:43 AM

Adrian This looks really painful .. it is not the intended behavior
When I get a chance I will test this with proxies

karol We can disagree on that part, it is all good

I replied to your other thread btw

**glorybound** · 12-08-2020, 07:44 AM

Originally posted by francomanko View Post

I have to say,, im only doing small residential renders, admittedly with proxy trees, but the IPR on gpu just grinds to a halt...on a single 2080ti...the startup time is pretty poor,constantly 'building gpu dynamics' and absolutely not starting instantly. Ive attached a video to show the speed when i move an object....and its generally a lot slower than whats shown.

That's my complaint with GPU. Stop your render, wait 5 minutes, make a small change, and wait 5 more minutes for the render to start again. Frustrating as heck and do this a few times and your morning is gone in wait time.

**Joelaff** · 12-08-2020, 10:43 AM

The point about the example setup (which is purely an example, not my setup) is that you do need specific hardware to make good use of it. We have a good sized farm of many fast CPUs, but most nodes have older GPUs because it made no sense to buy them in the past.

I could never work with just GPU as our scenes are usually too big and complex for the best GPUs out there now. So it makes no financial sense to buy a bunch of GPUs for the much more limited GPU renderer, at least for us on our projects.

If I could get away with GPU for 90% of my work it would be a different story, but for now it is about 5%. I am excited that we have gotten this far, and I figure it will increase over time.

Given most people do not have setups purely for GPU we need user controllable block sizes. There is no downside to adding these controls. You can continue using the big blocks and those who need the control can adjust them. One size never fits all!

How are you rendering one frame per GPU on a single machine? Multiple instances of Max or Maya? We used to do that for CPU, but now so many scenes use so much memory that it isn’t worth it much.

**Muhammed_Hamed** · 12-08-2020, 11:44 AM

Fair enough, that parameter exists in some Vray plugins anyways.. example is Vray for Sketchup.. I don't know why it is exposed in their UI (and with default GPU bucket size of 32 oddly, so people change it to 256)
If you have older GPUs, they gonna slow down your rendering.. Same for slow CPUs, so best is to avoid these, to make life easier on yourself

How are you rendering one frame per GPU on a single machine? Multiple instances of Max or Maya?

No.. this is not efficient for the task..
you can use something like this
https://vrscenegui.babylondreams.de/
Exporting multiple bat files and using Vray standalone for rendering.. pretty much all people that render animation on GPUs use this workflow
Because usually a frame will take less than a minute to render on one GPU, but it will take more than a minute to load into VRAM.. This means if you use 7 GPUs for example to render a single frame, your GPUs will be idling waiting for the scene to load into VRAM
So if you render one frame per device, you avoid this issue..
Some people has their inhouse tools to deal with this, and I believe you can do this with deadline(in houdini at least)

There is no downside to adding these controls.

We still disagree on this part, which is fine.. This is possible in other engines, and people end up messing this up a lot.. I talked to Blago and he said they decided to hide these controls in Vray for good.
With good rendering devices, either CPU or GPU you should be fine with the current bucket size in my view. GPUs and CPUs now are quite affordable, most people will have a threadripper or a 2060 Super at least..

I could never work with just GPU as our scenes are usually too big and complex for the best GPUs out there now

Did you try with a 2080Ti for example or a Titan card?
On my 2 Titans I can fit up to 900 million polygons with NVlink, Maya usually crashes before I can crash Vray with out of memory errors
With 2x 2080tis and NVlink you have access to nearly 20 GB of VRAM.. so around 400-500 million polies
Even before getting the Titan cards I would barely run out of VRAM on any projects.. Vray GPU is quite efficient on using memory(except for dispalcement)
People would think a scene that takes 60 GB of RAM on CPU will need 60 GB of GPU memory..which is not the case.. Converting some of my scenes that use 60-70 GB of RAM, they usually fit in the VRAM of my 1080tis ..

**seandunderdale** · 12-08-2020, 12:58 PM

Originally posted by Muhammed_Hamed View Post

you can use something like this
https://vrscenegui.babylondreams.de/

Does this do anything Deadline cant do?

**Joelaff** · 12-08-2020, 01:58 PM

Originally posted by Muhammed_Hamed View Post

If you have older GPUs, they gonna slow down your rendering.. Same for slow CPUs, so best is to avoid these, to make life easier on yourself

Of course. But when you have been doing this job for decades you have a lot of hardware that gets gradually rotated out as it becomes more obsolete/not worth the electricity. There are plenty of machines 2-3 years old that still work great for CPU.

you can use something like this
https://vrscenegui.babylondreams.de/

OK. I figured something like that or Deadline.

With good rendering devices, either CPU or GPU you should be fine with the current bucket size in my view. GPUs and CPUs now are quite affordable, most people will have a threadripper or a 2060 Super at least..

For workstations, yes. For rendering most places rotate out hardware gradually like we do. You want to get the most of out your investment and not have to replace your entire farm every year. Hardware is getting faster quicker again thanks to AMD. There were a few years there with Intel where it might take 3 years to double the speed. Amazing what competition can do!

Did you try with a 2080Ti for example or a Titan card?

Those would be "the best cards out there now" like I said.

So of course. Workstations all have the latest and greatest.

I still fill those all the time. I too am always amazed at how a scene that takes 70 GB in CPU might render on a 2080Ti with only 11GB (or whatever it has). Looking forward to improvement in Out Of Core.

Every time I try to work in GPU right now I usually spend more time messing with it than I save in the render. And with a large farm the render time is not even that big of a deal. Maybe someday we can do everything in GPU and would buy a bunch of GPUs instead of CPUs.

If you have multiple GPUs on one machine does each one need a render node license? If not, that could be a savings, until developers decide to yank that out from under you.

**Joelaff** · 12-08-2020, 02:02 PM

Oh, and the removal of a user settable bucket size from CPU would be shear lunacy. If Chaos does this then are totally out of touch with production. (Which is why they have not yet done this.) The auto-split does not work anywhere near perfectly, and everyone complains about last block syndrome. Smaller blocks rarely make any speed difference except in the fastest scenes, whereas large blocks make us wait for that last block over and over.

Only the user knows what is best for their scene.

Edit:

And here is the reason why we need user controllable block sizes for CPU and for GPU.

In VFX at least half of our renders look like this: (lots of blank space with a small CG element that is going into a live action plate)

Now tell me how a block size of 256 is more efficient to render that scene! Maybe GPU uses all the cores for each block (I don't know), but CPU sure as heck doesn't. With some lame hard coded block size this scene would render on ONE core rather than setting the block size appropriate for this type of scene (likely 16pixels, perhaps even 8 depending on how complex that teapot is (image it is a 20mil poly spaceship with lots of vector displacement).

One size does not fit all for block sizes. Maybe for full frame arch-vis renders, but surely not for VFX.

(Sure you could use a render region, but that is a pain to animate. You could use a render mask, but that is also a pain to animate. Both of these also affect any Light Cache you might be using. Note that although our scene frequently only render something small there is a full environment around them built that is invisible to camera in order to handle lighting and shadows.)

And of course if Chaos implements a method to use all cores until the very end, continuously subdividing or taking stochastic samples, etc. then block size would be much less of an issue. But for now it is a big issue that we need to be able to control.

**Muhammed_Hamed** · 13-08-2020, 06:15 AM

Originally posted by seandunderdale View Post

Does this do anything Deadline cant do?

Yes, a few things here and there (they list all their features in the link I posted
But honestly Deadline gets the job done(specially the part that Joelaff talked about

**Muhammed_Hamed** · 13-08-2020, 06:27 AM

Originally posted by Joelaff View Post

You want to get the most of out your investment and not have to replace your entire farm every year.

These slower devices (whether they are CPUs or GPUs) will have a negative impact on performance.. Like what others mentioned earlier in the post, combining both CPUs and GPUs in Hybrid mode can sometimes slow down rendering
Not replacing the entire farm, just stick to the good devices that can really benefit this kind of rendering. Any machines with Dual Xeons/threadrippers.. any 1080tis or above should be great.
Don't add outdated cards (Maxwell or ealier) they will cause driver issues and slowdowns like what I explained. Don't add quad core CPUs, only stick to the beafy ones..

Originally posted by Joelaff View Post

how a scene that takes 70 GB in CPU might render on a 2080Ti with only 11GB (or whatever it has). Looking forward to improvement in Out Of Core.

This is using RTX mode btw, without Out of core
GPU renderers use tricks to compress Geometry/textures .. and something like Mipmapped textures can save a lot of GPU memory
Take a look at this,
Fstorm can handle +200 million polies in only 8 GB of GPU memory
https://www.facebook.com/groups/FSto...2709633692111/

Same magic with displacement and high resolution textures

https://www.facebook.com/groups/FSto...9841686312239

About Out of Core, it is the most overrated thing about GPU rendering in my humble view.. it is been around for so long in Redshift, Octane, Cycles and whatnot
No one uses OOC in production.. it slows down rendering speed to the level of CPU speeds.. and you will have your GPUs at 40 or 50% utilization
It can also cause instability .. I don't know anyone that uses it in production.
For GPU rendering, role of thumb is to fit the scene in the GPU memory.. other than that, you better use CPU rendering at this point. (maybe Chaos Group's implementation could be different? who knows)

Originally posted by Joelaff View Post

If you have multiple GPUs on one machine does each one need a render node license? If not, that could be a savings, until developers decide to yank that out from under you.

You can have up to 20 GPUs per machine(with near 100% linear scaling), and use one single license.. it is one advantage to GPU rendering
My previous employer had couple of nodes with 10 and 11 GPUs ..

**Muhammed_Hamed** · 13-08-2020, 06:43 AM

Originally posted by Joelaff View Post

Oh, and the removal of a user settable bucket size from CPU would be shear lunacy

My talk was only about GPU bucket size, this doesn't apply to CPU rendering (I don't need to say that honestly)
GPU buckets don't get stuck ever .. GPUs have thousands of cores, I never came across a case where Vray GPU buckets get stuck(unless you are using an outdated GPU maybe, then this could happen)
For CPU rendering, 3Delight and Renderman use even smaller bucket sizes than Vray.. Vray uses 64 in Maya/Houdini/Modo and 48 in 3Ds Max by default.. 3Delight uses smaller buckets of 8 pixels I think or maybe 16 (to avoid stuck buckets)

I get your point about controlling bucket size, I still stick to my original argument of hiding them and making the UI cleaner.
I have been doing GPU rendering for years, I have used every GPU renderer in the market.
In a production environment bigger bucket sizes (like 256) are necessary to get the best utilization of your devices, specially if you have multiple cards.

**Joelaff** · 13-08-2020, 09:52 AM

Originally posted by Muhammed_Hamed View Post

My talk was only about GPU bucket size, this doesn't apply to CPU rendering (I don't need to say that honestly)
GPU buckets don't get stuck ever .. GPUs have thousands of cores, I never came across a case where Vray GPU buckets get stuck(unless you are using an outdated GPU maybe, then this could happen)
For CPU rendering, 3Delight and Renderman use even smaller bucket sizes than Vray.. Vray uses 64 in Maya/Houdini/Modo and 48 in 3Ds Max by default.. 3Delight uses smaller buckets of 8 pixels I think or maybe 16 (to avoid stuck buckets)

I get your point about controlling bucket size, I still stick to my original argument of hiding them and making the UI cleaner.
I have been doing GPU rendering for years, I have used every GPU renderer in the market.
In a production environment bigger bucket sizes (like 256) are necessary to get the best utilization of your devices, specially if you have multiple cards.

I am very optimistic about GPU as a whole. Interest your comments about out of core in the previous post. I have found it slows things down a lot. Sad that it doesn’t look like a GPU savior. I am sure GPUs will continue to get more memory.

I know GPUs have tons of cores. I wonder how those are used with respect to the buckets. In other words, If you have 1000 cores and ten buckets does each bucket get 100 cores? What about once there is only one bucket remaining? Is that bucket still using only 100 cores, or does it get all cores? Since you say you don’t see a lot of stuck buckets I wonder if that is from some method like this, or if it just your scenes. When our scenes have buckets that are all about the same complexity they don’t get stuck. However, at least with CPU, when you have a few small areas with super bright highlights or displacements, SSS, or other complex effects then you get the stuck buckets unless they are very small.

Been using MIP mapped tiled textures for years with GPU for any scenes that run into memory issues. Great technology.

I look forward to learning more about how to get the most out of GPU. Thanks for the info.

**a7az0th** · 17-08-2020, 01:36 AM

Hey guys, a little late, but let me chime in on the discussion.
First of all I think that all this is a very valuable input. We've been doing our best and continue to do so in making the V-Ray GPU engine as fast and stable as possible.
What I hear from users all around is that V-Ray GPU as it stands today is more stable than ever and we're doing massive job on making it even more stable.
Also we've done massive job on making IPR updates fast, reducing times to first pixel and reducing the amount of CPU RAM and VRAM used by V-Ray GPU so you can fit bigger scenes into the existing memory pool.
And with every generation of GPUs you can see that NVidia also make it easier for artist to render bigger and bigger scenes on the GPU , by constantly increasing the memory pool of GPUs and introducing great new tech , such as NVLink.

That being said, we're not of course done developing V-Ray GPU and we're looking forward for more input such as this in order to understand better what issues you face when dealing with it so we can address them and make your lives easier.

Originally posted by francomanko View Post

I have to say,, im only doing small residential renders, admittedly with proxy trees, but the IPR on gpu just grinds to a halt...on a single 2080ti...the startup time is pretty poor,constantly 'building gpu dynamics' and absolutely not starting instantly. Ive attached a video to show the speed when i move an object....and its generally a lot slower than whats shown.

The IPR updates in this video are indeed painful. I have not seen such update speeds recently. I know we've done a lot to improve this particular aspect, so please if you can, share a scene with us that we can investigate. This is an issue and will be fixes ASAP if we can reproduce it.

Originally posted by glorybound View Post

That's my complaint with GPU. Stop your render, wait 5 minutes, make a small change, and wait 5 more minutes for the render to start again. Frustrating as heck and do this a few times and your morning is gone in wait time.

This is also absolutely not expected. Can you send a scene over for us to profile and investigate what's going on? Sometimes it's little things in a corner case that takes an hour to fix, but we never got it reported. Only recently I found a problem that in certain situations it could take V-Ray 10 minutes to start rendering a scene, just because of a minor issue, that when fixed the scene started rendering in just 10 seconds, so ... scenes that we can repro with are always welcome. The worst offenders even make it to our nightly test suite, so we can run tests on them daily, so we can make sure the problem never returns.

Originally posted by Joelaff View Post

I know GPUs have tons of cores. I wonder how those are used with respect to the buckets. In other words, If you have 1000 cores and ten buckets does each bucket get 100 cores? What about once there is only one bucket remaining? Is that bucket still using only 100 cores, or does it get all cores?

The reason the buckets are so big is because every bucket is assigned to a single GPU and is designed to be able to achieve 100% utilization across all possible GPUs. We usually fire 6 buckets per GPU, meaning we target 600% utilizations. This is actually good for the GPU and makes sure that even while the GPU is done rendering а bucket and it's the CPU's turn to assemble the rendered pixel results into the VFB, the GPU will not stay idle and will always have work to do, so you can always have a flat 100% utilization curve. We're considering ways to expose to the users control over the bucket size for the CPU when rendering in Hybrid mode, but I also don't think it is a good idea to expose it to users for the GPU. I fear a lot of people would not know how to set it right and will complain that the GPU utilization is not at 100%. This is especially true for mixed setups when you have a single machine with multiple GPUs from different generations inside, for example a 750ti, a 1080 and an RTX 2070.
The idea for controlling it with environment variable for the DR case it worth considering though, so I will bring this matter up for an internal discussion.

**Joelaff** · 17-08-2020, 01:37 PM

Originally posted by a7az0th View Post

The reason the buckets are so big is because every bucket is assigned to a single GPU and is designed to be able to achieve 100% utilization across all possible GPUs. We usually fire 6 buckets per GPU, meaning we target 600% utilizations. This is actually good for the GPU and makes sure that even while the GPU is done rendering а bucket and it's the CPU's turn to assemble the rendered pixel results into the VFB, the GPU will not stay idle and will always have work to do, so you can always have a flat 100% utilization curve. We're considering ways to expose to the users control over the bucket size for the CPU when rendering in Hybrid mode, but I also don't think it is a good idea to expose it to users for the GPU. I fear a lot of people would not know how to set it right and will complain that the GPU utilization is not at 100%. This is especially true for mixed setups when you have a single machine with multiple GPUs from different generations inside, for example a 750ti, a 1080 and an RTX 2070.
The idea for controlling it with environment variable for the DR case it worth considering though, so I will bring this matter up for an internal discussion.

First off, thanks for the reply, and thanks for continuing to improve GPU, and for caring what users think.

I do understand the idea, and it is good to know that each bucket can indeed use the entire GPU. This seems like a very efficient approach for GPU only rendering.

It sounds like you understand my concerns over the bucket size in a mixed render node environment, especially with hybrid. If you have a fast CPU and a slow GPU (this really often is the case on render nodes at any shop who is not jumping heavily into GPU). Perhaps you could do some kind of speedtest (e.g. the old "bogomips") of the GPU on first setup to determine the optimum block size, knowing that a smaller block size, or few concurrent blocks may be better with slower GPUs.

An environment variable or other method to configure per-node bucket sizes would be really great! I think it would take care of any needs, without causing any confusion for the casual user who may not even know what an environment variable is. Please note this would ideally be able to be a multiplier of the scene's block size (for CPU), and there should be independent settings for GPU and CPU (both in hybrid/GPU, but also with normal vRay CPU (VRay Adv)), please.

How are you currently handling it for the mixed setups on the same machine? (Like the example you provide of 750ti, a 1080 and an RTX 2070). Surely large blocks for all of these would result in waiting for Last Blocks on the slower GPUs, no?

Thanks.

Announcement

RTX speed increase?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment