Announcement

Collapse
No announcement yet.

Perplexing Memory issue on render farm - nodes with more ram throw errors

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Perplexing Memory issue on render farm - nodes with more ram throw errors

    Hello all!

    We are rendering a large animated scene on a local render farm. The farm consists of workstations and nodes. All machines on the network have at least 64gb RAM.

    9 of the machines on the network are newer - they have 96gb RAM. All other machines have 64gb of RAM.

    We are using Pulze Render Manager to distribute jobs amongst the machines on the network.

    The job fails on the 9 newest machines with this error:

    [V-Ray] Building Embree Voxel Tree failed: [EmbreeTree<0,1,class VUtils::StaticTreeDelegateParam,struct VUtils::SimpleStaticTriangle,struct VUtils::SDTreeBunchDelegetorFace>::fillFaces] 4: Setting index buffer failed (number of triangles 1150328


    As I understand it from historical posts on this forum, this error suggests a lack of RAM. But, as I've described, that can't be the whole story. The 9 machines that fail on this job all have more RAM than the other machines.

    I can remotely log into all machines on the network. When I watch one of the "problem" machines load the job, they go through the familiar sequence: Receive File (load job), update instances, load bitmaps, etc..

    They make it through the "building static embree accellerator" phase, and begin to build the light cache. The light cache loads for a few seconds, and then the program crashes. All machines behave the exact same way. All other machines (with 64gb RAM), load the job and run through the exact same sequences, but they don't crash. The workstations and older nodes can render a frame in about 30 minutes, and move onto the next frame without issues.

    Watching resource utilization via task manager, the machines with less RAM top out at their maximum capacity during rendering - 63-64gb used. However, the machines with 96 gb of ram load more than 70 or 80gb of RAM before crashing.So, for some reason, the machines with 64gb of RAM are better able to handle this scene's memory requirements.

    We have three varieties of machine on the network. These are the configurations:

    1. Workstation - i9 9900k, 64gb 2667 MHZ DDR4
    2. Node (old): 2x Intel Xeon 2680, 64gb RAM @ 2933mhz
    3: Node (new): 2x Intel Xeon Gold 6230, 96gb RAM @ 2933 MHZ <---- Issue happens with this group


    Does anything stand out to anyone as being an obvious conflict for large scenes? Why would a memory-related error show up on only the machines with more RAM?

    For what it's worth, all machines are running version-identical software packages, Windows updates, and basic configuration schemes.

    Thank you for any insight!

  • #2
    Hello,

    Some things you might try:
    1. Try running the animation directly on some of the problem machines - there might be some issue when distributing the jobs.
    2. If the render starts building the LightCache and only then runs out of memory - this means that there's some dynamic geometry involved. What is your dynamic memory limit ? Does anything change if you lower it (rendering might be slower) ?
    3. Do you have any third party plugins that might generate a lot of geometry - like ForestPro/RailClone - what happens if you disable those ?
    4. The newer machines have a lot more threads (80 if I'm reading it right). Each thread requires some more RAM - ideally it should be a few megabytes but there might be some issue. Could you try limiting the number of threads to some lower number - you can do it with MAXScript in the scene:
      Code:
      renderers.current.system_numthreads=32
      - this would limit the threads to 32. If you set it to 0 - it will remove all limits. This setting is per scene.
    5. If you send the scene to our support - they could test it here too..

    Best regards,
    Yavor
    Yavor Rubenov
    V-Ray for 3ds Max developer

    Comment


    • #3
      Originally posted by yavor.rubenov View Post
      Hello,

      Some things you might try:
      1. Try running the animation directly on some of the problem machines - there might be some issue when distributing the jobs.
      2. If the render starts building the LightCache and only then runs out of memory - this means that there's some dynamic geometry involved. What is your dynamic memory limit ? Does anything change if you lower it (rendering might be slower) ?
      3. Do you have any third party plugins that might generate a lot of geometry - like ForestPro/RailClone - what happens if you disable those ?
      4. The newer machines have a lot more threads (80 if I'm reading it right). Each thread requires some more RAM - ideally it should be a few megabytes but there might be some issue. Could you try limiting the number of threads to some lower number - you can do it with MAXScript in the scene:
        Code:
        renderers.current.system_numthreads=32
        - this would limit the threads to 32. If you set it to 0 - it will remove all limits. This setting is per scene.
      5. If you send the scene to our support - they could test it here too..

      Best regards,
      Yavor
      Hello! Thank you for the ideas. I'll try a few of those out and report back.

      To answer your questions: Our dynamic memory limit is set to the default 0mb. And yes, we are using a lot of Forest Pack in this scene, which is undoubtedly causing the high memory usage. I can try rendering without those enabled to confirm.

      Thanks for the tip on reducing the number of threads, I'll give that a try too. Am I correct in assuming that will slow render times on the newer machines? Thanks again!

      Comment


      • #4
        Am I correct in assuming that will slow render times on the newer machines?
        Yes it will most probably slow down the rendering but will show if the issue is related to the number of threads.
        Yavor Rubenov
        V-Ray for 3ds Max developer

        Comment


        • #5
          Originally posted by yavor.rubenov View Post
          Yes it will most probably slow down the rendering but will show if the issue is related to the number of threads.
          Ok, thanks.

          I just started a render sequence on one of the problem machines, executed locally, and it maxes out at 75GB RAM usage, but makes it through the Light Cache and begins rendering! So, this suggests to me that there is some issue with the way the machines are loading or receiving the scene from the distributed program (Pulze Render Manager).

          Comment


          • #6
            Ok! After a few tests, I was able to mitigate the issue by checking the "Conserve memory" box, as well as selecting "dynamic" under geometry mode. I also simplified a few things in-scene to make it a little lighter weight. Now, all nodes load & render just fine. The 64GB machines max out at about 53GB used, and the 96GB machines max out at about 60GB. Changing the thread count in-scene didn't have any effect, and I didn't try setting a max dynamic memory limit. So I think this issue is closed for us!

            Comment

            Working...
            X