Hello all!
We are rendering a large animated scene on a local render farm. The farm consists of workstations and nodes. All machines on the network have at least 64gb RAM.
9 of the machines on the network are newer - they have 96gb RAM. All other machines have 64gb of RAM.
We are using Pulze Render Manager to distribute jobs amongst the machines on the network.
The job fails on the 9 newest machines with this error:
As I understand it from historical posts on this forum, this error suggests a lack of RAM. But, as I've described, that can't be the whole story. The 9 machines that fail on this job all have more RAM than the other machines.
I can remotely log into all machines on the network. When I watch one of the "problem" machines load the job, they go through the familiar sequence: Receive File (load job), update instances, load bitmaps, etc..
They make it through the "building static embree accellerator" phase, and begin to build the light cache. The light cache loads for a few seconds, and then the program crashes. All machines behave the exact same way. All other machines (with 64gb RAM), load the job and run through the exact same sequences, but they don't crash. The workstations and older nodes can render a frame in about 30 minutes, and move onto the next frame without issues.
Watching resource utilization via task manager, the machines with less RAM top out at their maximum capacity during rendering - 63-64gb used. However, the machines with 96 gb of ram load more than 70 or 80gb of RAM before crashing.So, for some reason, the machines with 64gb of RAM are better able to handle this scene's memory requirements.
We have three varieties of machine on the network. These are the configurations:
1. Workstation - i9 9900k, 64gb 2667 MHZ DDR4
2. Node (old): 2x Intel Xeon 2680, 64gb RAM @ 2933mhz
3: Node (new): 2x Intel Xeon Gold 6230, 96gb RAM @ 2933 MHZ <---- Issue happens with this group
Does anything stand out to anyone as being an obvious conflict for large scenes? Why would a memory-related error show up on only the machines with more RAM?
For what it's worth, all machines are running version-identical software packages, Windows updates, and basic configuration schemes.
Thank you for any insight!
We are rendering a large animated scene on a local render farm. The farm consists of workstations and nodes. All machines on the network have at least 64gb RAM.
9 of the machines on the network are newer - they have 96gb RAM. All other machines have 64gb of RAM.
We are using Pulze Render Manager to distribute jobs amongst the machines on the network.
The job fails on the 9 newest machines with this error:
[V-Ray] Building Embree Voxel Tree failed: [EmbreeTree<0,1,class VUtils::StaticTreeDelegateParam,struct VUtils::SimpleStaticTriangle,struct VUtils::SDTreeBunchDelegetorFace>::fillFaces] 4: Setting index buffer failed (number of triangles 1150328
As I understand it from historical posts on this forum, this error suggests a lack of RAM. But, as I've described, that can't be the whole story. The 9 machines that fail on this job all have more RAM than the other machines.
I can remotely log into all machines on the network. When I watch one of the "problem" machines load the job, they go through the familiar sequence: Receive File (load job), update instances, load bitmaps, etc..
They make it through the "building static embree accellerator" phase, and begin to build the light cache. The light cache loads for a few seconds, and then the program crashes. All machines behave the exact same way. All other machines (with 64gb RAM), load the job and run through the exact same sequences, but they don't crash. The workstations and older nodes can render a frame in about 30 minutes, and move onto the next frame without issues.
Watching resource utilization via task manager, the machines with less RAM top out at their maximum capacity during rendering - 63-64gb used. However, the machines with 96 gb of ram load more than 70 or 80gb of RAM before crashing.So, for some reason, the machines with 64gb of RAM are better able to handle this scene's memory requirements.
We have three varieties of machine on the network. These are the configurations:
1. Workstation - i9 9900k, 64gb 2667 MHZ DDR4
2. Node (old): 2x Intel Xeon 2680, 64gb RAM @ 2933mhz
3: Node (new): 2x Intel Xeon Gold 6230, 96gb RAM @ 2933 MHZ <---- Issue happens with this group
Does anything stand out to anyone as being an obvious conflict for large scenes? Why would a memory-related error show up on only the machines with more RAM?
For what it's worth, all machines are running version-identical software packages, Windows updates, and basic configuration schemes.
Thank you for any insight!
Comment