Announcement

Collapse
No announcement yet.

VIDEO of using vrayspawner.exe on 72 Threads Xeon E5 - Slight set-up problem..

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    In chart

    And here the download link of my scene.

    https://www.dropbox.com/s/zz0wpl1sxw...yscene.7z?dl=0

    Click image for larger version

Name:	time in seconds.jpg
Views:	1
Size:	61.7 KB
ID:	856386
    Last edited by Pete3d; 23-06-2015, 04:17 AM.

    Comment


    • #17
      Running your scene on the two of the most beefiest machines we have:
      Code:
      * E5-2687W 0 @ 3.10GHz
      32 threads display=1			173.27
      32 threads display=0			164.23
      16 threads node=0, display=0		305.71
      
      * E5-2680 v2 @ 2.80GHz
      40 threads display=0		129.09
      20 threads display=0		245.13
      It seems that V-Ray scales just fine on 32 and 40 cores.

      One thing you could try is to test with display=0 and with smaller resolution image and more samples or even more geometry and more complex shaders.
      V-Ray developer

      Comment


      • #18
        Interesting.
        Yeah, I can see that it scales fine with your 32 and 40 threads.
        The scaling seems to drop beyond that for some reason. Maybe my linux setup is not optimized.
        But if you compare my 32 threads time to your 40 threads time they come quiet close. Its only after 40 threads it goes into total bad values.
        Whats interesting as well is if you compare your 16 threads vs my 18 threads. There is a huugee gap!!!

        DO you think the linux distribution makes difference ?
        Do you have a scene for me maybe I can test?


        Click image for larger version

Name:	time in seconds (lower is better).jpg
Views:	1
Size:	67.2 KB
ID:	856387

        Originally posted by t.petrov View Post
        Running your scene on the two of the most beefiest machines we have:
        Code:
        * E5-2687W 0 @ 3.10GHz
        32 threads display=1			173.27
        32 threads display=0			164.23
        16 threads node=0, display=0		305.71
        
        * E5-2680 v2 @ 2.80GHz
        40 threads display=0		129.09
        20 threads display=0		245.13
        It seems that V-Ray scales just fine on 32 and 40 cores.

        One thing you could try is to test with display=0 and with smaller resolution image and more samples or even more geometry and more complex shaders.
        Last edited by Pete3d; 23-06-2015, 07:15 AM.

        Comment


        • #19
          I've not installed the OSes on both machines, so I don't know if there is something special that needs to be done.
          We have no CentOS 7.x machines around, so I cannot verify that it is not related to the distro/kernel.

          Both machines I've tested are CentOS 6.x.

          Can you run one last test with -display=0 and all threads?
          V-Ray developer

          Comment


          • #20
            I was able to push it better by another 10 sec with the DR setup when using 36 threads at a time.
            $ numactl -N 0 ./vray -sceneFile=/home/user/Downloads/vrscene/myscene.vrscene -distributed=1 -renderHost=localhost -numThreads=36 -display=0
            Frame took 84.64 sec (display=0) vs 95.04 sec (display =1)

            It looks like the display doesn't work as promised with more than 40 Threads.
            I ran three times with 'all threads' on -display=0 with results of 138.30 sec, 141.20 sec and 141.96 sec.
            I ran again -display=1 and I got 141.10 sec (yesterday 135.86). So basically it makes virtually no difference.

            $ ./vray -sceneFile=/home/user/Downloads/vrscene/myscene.vrscene
            Frame took 138.30 s, 141.20 s, 141.96 s


            Any more ideas ?


            Originally posted by t.petrov View Post
            I've not installed the OSes on both machines, so I don't know if there is something special that needs to be done.
            We have no CentOS 7.x machines around, so I cannot verify that it is not related to the distro/kernel.

            Both machines I've tested are CentOS 6.x.

            Can you run one last test with -display=0 and all threads?
            Last edited by Pete3d; 23-06-2015, 07:49 AM.

            Comment


            • #21
              Hm the difference is still 25-30%, so can you post the render times from similar test cases run on windows with the same build?

              I'm interested in seeing the times for 36+36 threads.

              To test this you have to use the following commands:
              Code:
              $ start /node 1 vray.exe -server -numThreads=36
              $ start /node 0 vray.exe -sceneFile=<path to scene>.vrscene -numThreads=36 -distributed=1 -renderHost=localhost
              And the time when rendering with 36 threads only:
              Code:
              $ vray.exe -numThreads=36 -sceneFile=<path to scene>.vrscene
              $ start /node 0 -numThreads=36 -sceneFile=<path to scene>.vrscene
              V-Ray developer

              Comment


              • #22
                Hi Petrov,

                I managed to render with your command line on the latest build (30. June).

                start /node 0 vray.exe -sceneFile=new.vrscene -numThreads=36 -distributed=1 -renderHost=localhost
                Rendertime: 111.89s, 109.27s, 106.99s
                What needs to be noted, that display was on. If i put it on display=1 unfortunatelly the windows cmd disappeared and I couldn't read the value.

                I looks like the rendertimes are not much better compared to Linux. They seem to be even worse by up to 10s-20s/frame.
                Check out the chart.

                To summarize for everyone:

                At this point it looks like Linux with split slave on seperate numa node -> Basically DR on slave with 32threads and display =0 produced the fastest render with around 85s.

                I didn't put this number into the chart as all the numbers are with display=1 to be found there.

                Originally posted by t.petrov View Post
                Hm the difference is still 25-30%, so can you post the render times from similar test cases run on windows with the same build?

                I'm interested in seeing the times for 36+36 threads.

                To test this you have to use the following commands:
                Code:
                $ start /node 1 vray.exe -server -numThreads=36
                $ start /node 0 vray.exe -sceneFile=<path to scene>.vrscene -numThreads=36 -distributed=1 -renderHost=localhost
                And the time when rendering with 36 threads only:
                Code:
                $ vray.exe -numThreads=36 -sceneFile=<path to scene>.vrscene
                $ start /node 0 -numThreads=36 -sceneFile=<path to scene>.vrscene
                Click image for larger version

Name:	time in sec (lower is better).jpg
Views:	1
Size:	75.1 KB
ID:	856448
                Last edited by Pete3d; 30-06-2015, 01:48 AM. Reason: Attached chart

                Comment


                • #23
                  Can you please use the same build as on Linux?
                  Your results are rather strange.
                  Can you repeat the test where the render time has been ~1:07?
                  V-Ray developer

                  Comment


                  • #24
                    Hi t.petrov,

                    Sorry for before - I now used the same build like on the linux renders before (23. June)

                    =============================================
                    start /node 1 vray.exe -server -numThreads=36
                    start /node 0 vray.exe -sceneFile=new.vrscene -numThreads=36 -distributed=1 -renderHost=localhost
                    Rendertime: 100.92s
                    Rerendertime: 104.74, 97s, 108.83s

                    What I noticed is that while 100 % of the CPU is being used -> 1 CPU node (18 cores) would not stay at 100% like the other node.
                    You can see little jumps here and there in that instance. I have captured it in a video. I have rendered once and then rerendered again.
                    The output windows says first time 97 s and then for no reason 108 s


                    How come the History says there is only 1 sec difference between both when the log says there is around 10 s difference between both ? Check video for proof.

                    (200 % speed)



                    ============================================
                    And some more here :

                    vray.exe -numThreads=36 -sceneFile=<path to scene>.vrscene
                    Rendertime: 151.91s
                    Rerendertime: 146.58, 147.45

                    ===========================================

                    start /node 0 vray.exe -sceneFile=new.vrscene -numThreads=36
                    Rendertime: 148.09s
                    Rerendertime: 154.88s

                    Originally posted by t.petrov View Post
                    Can you please use the same build as on Linux?
                    Your results are rather strange.
                    Can you repeat the test where the render time has been ~1:07?

                    Comment


                    • #25
                      Interesting and a bit disappointing.

                      Can you try to set the environment variable VRAY_USE_THREAD_AFFINITY=1 and see if there is any change in the performance.
                      The night build you're using should support it. It affects both Linux and Windows.
                      V-Ray developer

                      Comment


                      • #26
                        Hi t.petrov,

                        I tried out your variable and I didn't notice any performance change. I see that the affinities get assigned correctly
                        to individual numas with value set at 0 as well.
                        What I mean is that the thread numbers below always belonged to the right assigned numa flag.

                        $ numactl --hardware
                        available: 2 nodes (0-1)
                        node 0 cpus: 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 51 52 53
                        node 0 size: 32643 MB
                        node 0 free: 30023 MB
                        node 1 cpus: 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 54 55 56 57 58 59 60 61 62 63 64 65 66 67 68 69 70 71
                        node 1 size: 32768 MB
                        node 1 free: 26272 MB
                        node distances:
                        node 0 1
                        0: 10 21
                        1: 21 10

                        Originally posted by t.petrov View Post
                        Interesting and a bit disappointing.

                        Can you try to set the environment variable VRAY_USE_THREAD_AFFINITY=1 and see if there is any change in the performance.
                        The night build you're using should support it. It affects both Linux and Windows.

                        Comment

                        Working...
                        X