Announcement

Collapse
No announcement yet.

Ability to use more than 64 threads

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Ability to use more than 64 threads

    It would be great if Vray could work across several Windows processor groups by default. I know its possible with two DR spawners, but this is not optimal. I understand it also requires twice as much memory?
    Broadwell based Xeon 2600v4 with up to 22 cores will be out by the end of this year. When more than 64 threads is possible on dual socket machines, it suddenly becomes MUCH more common than if its restricted to four sockets or more.
    I think I have read somewhere that the problem is not Vray itself, but rather if the user changes the processor group for 3dsmax or something like that. Anyway, it would be very helpful if it was possible to enable this feature in Vray, even if it has restrictions that the user has to be aware of

  • #2
    We do have a build where this works, however I'm really not certain if performance will scale well with more than 64 threads in one single process. Multiple processes with a separate DR spawner for each processor group might be (way) more effective.

    Best regards,
    Vlado
    I only act like I know everything, Rogers.

    Comment


    • #3
      I see ... Say I want to render a range of images on a renderfarm with for example Deadline. Is it "straightforward" to render on several DR spawners using renderfarm software to control it? And is it correct that you need twice the memory? (which sounds logical.)

      Also, it`s very cool that you take the time to answer on the forums Vlado . It really builds confidence in your product!
      Last edited by CAMERON_SENSE; 24-08-2015, 06:22 AM.

      Comment


      • #4
        Well, I suspect that building a render farm of machines with more than 64 logical cores could be a serious waste of performance... It would probably end up much more efficient to buy more, but less powerful, machines.

        You do need twice the memory, yes (or more, depending on how many processor groups there are). It is relatively straightforward to run a DR spawner on multiple groups at the same time - we added options for that in the recent builds on the website. Each spawner gets a different port number, so each machine ends up with a range of DR ports, rather than a single one.

        Best regards,
        Vlado
        Last edited by vlado; 24-08-2015, 06:23 AM.
        I only act like I know everything, Rogers.

        Comment


        • #5
          Deadline has the ability to limit tasks to one CPU, and have a set number of concurrent tasks running on a node. I will talk to Deadline support to find out if this means it also distribute the jobs correct when there is multiple CPU-groups.

          About the ratio between fewer powerful or more, less powerful nodes: Running powerful nodes has many advantages when it comes to the number of licenses for all kinds of software (they are used for more than just vray), and it reduces the amount of administration that is needed. It can also be the cheapest option when you look at total system cost.

          Comment


          • #6
            Originally posted by CAMERON_SENSE View Post
            About the ratio between fewer powerful or more, less powerful nodes: Running powerful nodes has many advantages when it comes to the number of licenses for all kinds of software (they are used for more than just vray)
            That's true; V-Ray is pretty cheap compared to the cost of a machine, but I do realize that other softwares might be different.

            In any case, V-Ray licensing is per node, so even if you run multiple render jobs on one machine, it is still one license (this with regards to assigning multiple jobs to a machine).

            It can also be the cheapest option when you look at total system cost.
            Hmm, it would be interesting to see some numbers for that. Last time I checked, the price of many-core multi-CPU machines was quite large compared to what a few less powerful machines would cost.

            Best regards,
            Vlado
            Last edited by vlado; 24-08-2015, 07:49 AM.
            I only act like I know everything, Rogers.

            Comment


            • #7
              Originally posted by vlado View Post
              That's true; V-Ray is pretty cheap compared to the cost of a machine, but I do realize that other softwares might be different.

              In any case, V-Ray licensing is per node, so even if you run multiple render jobs on one machine, it is still one license (this with regards to assigning multiple jobs to a machine).

              Hmm, it would be interesting to see some numbers for that. Last time I checked, the price of many-core multi-CPU machines was quite large compared to what a few less powerful machines would cost.

              Best regards,
              Vlado
              Bying the most expensive 18 core CPUs is past the sweet spot when it comes to getting the most amount of GHz for total system cost. But when you put in factors like software licenses and administration costs, you are closing in on the high end of Xeon 2600 when it comes to bang for buck. Remember that we need to have everything from OS to 3dsmax(=expensive) on each node. I did some calulations on this as late as last week actually. I don`t have those anymore as it was just to get a general idea for myself.

              Some hardware vendors knows to "get paid". This can skew the image somewhat. I use a very reasonable priced local vendor here in Norway which uses Supermicro equipment.
              The other thing that can skew the image is memory costs. If you can make due with 16 or 32GB per node, you will cut the price per node quite a lot. But if you need 128GB, you are increasing the price substantially. (This might have changed now given that DDR4 prices have gone down a lot in price lately) In my case, the company I work for make subsea oil drilling equipment, which means we visualize large oil drilling rigs or ships, with little time to optimize. This is what makes the memory requirements so high
              I realize that using two DR spawners will double the memory requirements, which I did not factor in before, as I did not knew about this :P

              But say that you have everything set up the way that Vray needs to get it to work across multiple CPU groups: You seem to still recommend using several DR spawners instead? is it something that makes things slow down when one job runs on multiple cpu groups? I have done some testing on an array of different machines at work. What I found is that both Vray and Mental Ray scale almost perfectly as long as the machine has one or two cpu sockets. The only optimization I did was to lower the bucket size on machines with more cores. The machines had 16, 32 and 48 threads total, all on dual sockets.
              I also did some testing on four socket Xeon E5 4650 - machines (8 cores per CPU - 64 threads combined). Mental Ray dies completely on these, which I recon is because of memory traffic over the CPU`s QPI or something. Vray on the other hand, handles those pretty well. Scaling was generally above 80% of theoretical.
              This has led me to believe that Vray could run just fine with close to theoretical performance on machines with for example 72 threads on one job, as long as it is a dual socket machine. But I dont exaggerate if I say that you are the expert here :P
              Last edited by CAMERON_SENSE; 24-08-2015, 09:00 AM.

              Comment


              • #8
                Originally posted by CAMERON_SENSE View Post
                But say that you have everything set up the way that Vray needs to get it to work across multiple CPU groups: You seem to still recommend using several DR spawners instead?
                From our tests a while back, this seems to give better performance and this is my only consideration. However, we tested a 4-CPU AMD machine and it was a while ago, so maybe things are different now.

                I can get you a V-Ray build that has up to 256 threads enabled, and you can test that if you want? I guess it would settle the argument

                Best regards,
                Vlado
                I only act like I know everything, Rogers.

                Comment


                • #9
                  I would like that very much
                  I dont have any machines to try it on yet though. When I order the renderfarm nodes, I am going to have at least one dual socket machine with as many cores as possible to test this. I will request the build at that time, if that is okay?

                  When it comes to scaling, four sockets seems to involve alot of memory traffic or something which killed MR, but which Vray handled reasonably well (with some exceptions). I read an article at theplatform.net (which I am trying to find again) that if you want to run a database on a machine with four sockets, you should use Xeon E7, not E5 because of poor scaling on the latter. I know that E7 have more QPI channels between CPUs, which I guess is the reason for better scaling. I would guess that this is something that affects an old AMD CPU even more ...

                  Comment


                  • #10
                    We are looking upgrading our farm to dual 18 core cpu's, specifically E5-2699's. Does anyone know is this something that v-ray can handle in 3ds Max, and how does it scale?
                    Ramy Hanna

                    TILTPIXEL

                    Comment


                    • #11
                      Originally posted by ramy02 View Post
                      We are looking upgrading our farm to dual 18 core cpu's, specifically E5-2699's. Does anyone know is this something that v-ray can handle in 3ds Max, and how does it scale?
                      depends on how you use your farm. If its render jobs through bb or other render manager, your best bet to run multiple instances of slave on a single node. Single instance won't scale well, you are most likely run into problems. Same goes for dr for that matter.
                      Dmitry Vinnik
                      Silhouette Images Inc.
                      ShowReel:
                      https://www.youtube.com/watch?v=qxSJlvSwAhA
                      https://www.linkedin.com/in/dmitry-v...-identity-name

                      Comment


                      • #12
                        Originally posted by ramy02 View Post
                        We are looking upgrading our farm to dual 18 core cpu's, specifically E5-2699's. Does anyone know is this something that v-ray can handle in 3ds Max, and how does it scale?
                        So far we're usually seeing the expected performance boost from a single process. Some frames slow down to equal a much cheaper machine although Vlado said that ForestPack might be the culprit and the new update could possibly help those frames.

                        I'm going to post a long blog post on our Rendernode shopping with a huge excel spreadsheet. The real quick basics though broke down as such over 5 years:

                        Relative Value (5 yr costs, relative to CPUMark/$):

                        E5 2680 v3 x2: 100%
                        E5 2650 v3 x2: 99% <--What we ended up buying.
                        E5-2690 v3 x2: 97%

                        i7-5960x : 70%
                        i7-5930k : 66%
                        i7-5820k : 65%

                        i7-3930k : 63%

                        Zync: 32%
                        Azure: 23%-25%
                        Rebus: 7%-15%

                        Those are based off of theoretical numbers. We aren't quite hitting them. Sometimes we're getting as low as 50-60% of the performance "expected" from benchmarks but that's less than 50% of the time so I still feel confident that the dual E5s are the way to go. You have to look at all of your licenses too and long term recurring costs (networking, labor, real-estate etc). By our calculations the tail-end costs over 5 years for a render node are about $2,796.46 per machine. So yeah you could build an i7 5820k for $1231 by our estimates. But your total 5 year costs are closer to $4,000. By comparison a dual E5 costs $3,500 up front (nearly 3x as much) but once you add licensing and maintenance labor your'e looking at 'only' $6300 over 5 years. So for about 50% more you get nearly 156% more raw performance. And as alluded you can always run two max processes. Even over 1st-year expenses the dual E5-2650 theoretically bests everything else pretty handedly thanks to license expenses.
                        Last edited by im.thatoneguy; 25-08-2015, 08:30 PM.
                        Gavin Greenwalt
                        im.thatoneguy[at]gmail.com || Gavin[at]SFStudios.com
                        Straightface Studios

                        Comment


                        • #13
                          *Double post
                          Last edited by CAMERON_SENSE; 25-08-2015, 11:00 PM.

                          Comment


                          • #14
                            Originally posted by im.thatoneguy View Post
                            So far we're usually seeing the expected performance boost from a single process. Some frames slow down to equal a much cheaper machine although Vlado said that ForestPack might be the culprit and the new update could possibly help those frames.

                            I'm going to post a long blog post on our Rendernode shopping with a huge excel spreadsheet. The real quick basics though broke down as such over 5 years:

                            Relative Value (5 yr costs, relative to CPUMark/$):

                            E5 2680 v3 x2: 100%
                            E5 2650 v3 x2: 99% <--What we ended up buying.
                            E5-2690 v3 x2: 97%

                            i7-5960x : 70%
                            i7-5930k : 66%
                            i7-5820k : 65%

                            i7-3930k : 63%

                            Zync: 32%
                            Azure: 23%-25%
                            Rebus: 7%-15%

                            Those are based off of theoretical numbers. We aren't quite hitting them. Sometimes we're getting as low as 50-60% of the performance "expected" from benchmarks but that's less than 50% of the time so I still feel confident that the dual E5s are the way to go. You have to look at all of your licenses too and long term recurring costs (networking, labor, real-estate etc). By our calculations the tail-end costs over 5 years for a render node are about $2,796.46 per machine. So yeah you could build an i7 5820k for $1231 by our estimates. But your total 5 year costs are closer to $4,000. By comparison a dual E5 costs $3,500 up front (nearly 3x as much) but once you add licensing and maintenance labor your'e looking at 'only' $6300 over 5 years. So for about 50% more you get nearly 156% more raw performance. And as alluded you can always run two max processes. Even over 1st-year expenses the dual E5-2650 theoretically bests everything else pretty handedly thanks to license expenses.
                            PassMark CPU benchmark results does not equate rendering performance. Machines with more cores and lower frequency gets much closer to theoretical performance during rendering than it does in PassMark. Instead you should use tests like POVRay or Cinebench. Or even better: A typical testscene in your choosen render. I have gone through all this myself.

                            Comment


                            • #15
                              True. And if you want to build out 12 test machines and a wide array of example production scenes to test it out on I would welcome the results and happily punch them into my spreadsheet.

                              Sometimes the challenges are more practical than technical. Unfortunately finding reliable benchmarks for a wide range of CPUs in cinebench or POVray is not easy. Also neither are necessarily perfect analogs to Vray since their multi-threading performance may or may not match VRays. Nor would any of them necessarily match Nuke's multithreading performance nor do even I know exactly what manner of scene we will most likely render over the next 5 years. Passmark seems as good of a consistent-if-flawed benchmark to use as any other. For instance on some benchmarks there was more than a 20% difference between Windows and Linux. That would be an example of just being poorly optimized for one OS. If a benchmark didn't take advantage of SSE4 but Vray or Nuke or Arnold or some other application running on the farm did then the results would also be skewed. I learned a long time ago that benchmarking usually just ends in a trail of accusations and tears haha.

                              Benchmarking the cloud was even harder since a supposedly identical VM would sometimes run on completely different CPU platforms from one spin-up to the next.

                              Machines with more cores and lower frequency gets much closer to theoretical performance during rendering than it does in PassMark.
                              Are you sure you didn't mean to say the opposite? I've found that higher frequencies and less cores generally on average render faster, if anything Passmark overstates the performance of slower mega-core machines.
                              Last edited by im.thatoneguy; 26-08-2015, 12:17 AM.
                              Gavin Greenwalt
                              im.thatoneguy[at]gmail.com || Gavin[at]SFStudios.com
                              Straightface Studios

                              Comment

                              Working...
                              X