Announcement

Collapse
No announcement yet.

Houdini + Vray + Deadline

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Houdini + Vray + Deadline

    My team and I have been working on some tour visuals for the past month with some pretty crazy Houdini sims and now we're trying to render out the 7,000 frame sequence and in order to do so we're using Vray + Houdini on AWS using Deadline's AWS Portal system.

    It took us about a week to get to where we are now but our render times on a g3.8xlarge are still over an hour and a half, with 75%+ of that time spent with nothing happening. In the Vray logs, I can see that while rendering this frame it spent 40 minutes doing nothing:

    2019-10-10 04:55:02: 0: STDOUT: VFH [Progress] V-Ray: Prefiltering light cache... 100%
    2019-10-10 05:37:09: 0: STDOUT: VFH [Info] V-Ray: Average rays per light cache sample: 7.51 (min 1, max 398 )

    On top of that, I'd love to be able to take advantage of hybrid rendering since we have 32 cores sitting there (and billing us per core hour), but the issue is that Vray says C++ CUDA implementation needs to be "set explicitly" to use the processor but we don't have matching machines in our office that we can submit with in order to have that checkbox checked off.

    I'm under a lot of pressure to get this done (we're already a week late due to rendering issues), and would appreciate any help I can get!

    Attached is a full render log for one frame where you can see the big slowdown.

    Thanks in advance.


    Addtl info:

    Vray Build number: 4901 hash: 95595c9 from 01 Oct 2019 04:25
    Houdini build number: 17.5.290 (or .360 - can't remember)
    Attached Files
    Last edited by jeffh; 10-10-2019, 11:15 AM. Reason: added build numbers

  • #2
    Figured out the Hybrid rendering part with this thread, https://forums.chaosgroup.com/forum/...c2-spot-fleets.

    Testing out Brute Force GI rather than Light Cache, to see if that fixes my other issue.

    Comment


    • #3
      Turns out the side effect to turning on hybrid rendering is losing the second GPU on this instance according to the logs...

      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Initializing CUDA renderer...
      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Querying for CUDA devices...
      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Device 0 is C++/CPU on Intel(R) Xeon(R) CPU E5-2686 v4 @ 2.30GHz
      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Querying NVLINK ...
      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Device 0 does not use NVLINK
      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Initializing environment kernel
      2019-10-10 19:22:34: 0: STDOUT: VFH [Info] V-Ray: Number of CUDA devices: 1

      Comment


      • #4
        I've not had a great experience with graphics card drivers on AWS, gave it a shot recently and it was slow as hell. miserable experience.

        Have you tried going cpu only with some of the c5.xx's? The 5.12's are around the same price, or you can step up to the 18's for 72 threads.
        Last edited by Neilg; 10-10-2019, 12:59 PM.

        Comment


        • #5
          Originally posted by Neilg View Post
          I've not had a great experience with graphics card drivers on AWS, gave it a shot recently and it was slow as hell. miserable experience.

          Have you tried going cpu only with some of the c5.xx's? The 5.12's are around the same price, or you can step up to the 18's for 72 threads.
          Thanks for the tip, I'll give it a shot.

          Comment


          • #6
            Neilg, it made a huge difference! Got my frame times down to about 10-12 minutes from the 1hr 30mins - 3hrs they were previously on GPU.
            Last edited by jeffh; 11-10-2019, 04:04 PM.

            Comment


            • #7
              Glad it helped! That on the 5.18's?
              ​​​​​That's a huge result.​​​

              Comment


              • #8
                Originally posted by Neilg View Post
                Glad it helped! That on the 5.18's?
                ​​​​​That's a huge result.​​​
                I'm doing a blend of the 5.12, 5.18, and 5.24s. Deadline is not good about doing 50x of the same machine, even though we got our limit increased. Ended up with 300+ errors and a lot of wasted compute + credits for stalled workers because Deadline kept killing and restarting my nodes for some reason. (with 50 nodes at once... this has proven to be very expensive)

                But if I let Deadline make a pool that combines the three instance types, it does much better and I haven't really had any stalled workers recently.

                Comment


                • #9
                  Shit, we had a similar issue. We had amazon/thinkbox tech support trying to solve that for us too. they were completely stumped and said it shouldn't be happening - when monitoring it through remote desktop it would suddenly pop up with a 'windows is shutting down' message as if someone hit the power switch and deadline would lock up it's updates for 10 minutes before it realized the machine was not running and kill it.

                  We didnt think of making a pool that combined 3 instance types... had 2 tech support guys from amazon claim it should not be happening and it was so intermittent we just worked around it. Once it started we dropped to 10 machines or so, after 12hrs we found we could safely ramp back up to 40+ for another half day.

                  well, thanks for the tip!

                  Comment

                  Working...
                  X