Announcement

Collapse
No announcement yet.

Nube ? Renderfarm

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Nube ? Renderfarm

    So... When I try to render on my supermicro 5037m-h12trf 12 blade render farm. With various shaders and 500mb of textures it takes a hour for all nodes to receive the data. Each node has 2 x rj45 1gb/sec and the workstation has 2x rj45 1gb/sec the switch is Cisco 48x rj45 1gb/sec.

    You would presume a bottleneck while 12 nodes all try to access the same subfolder at once, but the first node only takes a few minutes to start, and the last node a hour.

    But it's only 500mb maybe 10 files. It should transfer all that at 7 seconds per node in theory, 2 Lines both doing 1g a second each even with a bottle neck.

    Any suggestions?

  • #2
    What kind of server are they reading the info from? It sounds like its most likely the issue with the network. One way you can double check this is if you copy a large file to all the nodes at once. I usually have a command line script that copies a 20 GB file to all nodes at once and monitor the network load. 12 machines accessing a server at 1 Gig, and I suppose your server can only support 2 Gig with two lanes at best, meaning 12 machines is 1200 Mb /s for 200 Mb/s capability. It maybe that your server is serving the files in a queue so first machine gets priority, then second then so on if it thinks its encountered too many connections and can't allocate that data at once. It also is worse for many smaller files then one a single large file. For example you can even check that, if you copy one large file over you will get 100 Mb/s speed. But if you copy 100 files that are 5 mb each or whatever, you will only get 50 Mb/s or something similar. Compound that into hundreds of files over dozens of machines you will get a slowdown for sure since the server is spending a lot of time just managing the file packets.
    Dmitry Vinnik
    Silhouette Images Inc.
    ShowReel:
    https://www.youtube.com/watch?v=qxSJlvSwAhA
    https://www.linkedin.com/in/dmitry-v...-identity-name

    Comment


    • #3
      I think it is my cables... I got a copy of 'lan speed test lite' and it said I am only getting 100mb/sec. Also my second rj45 is just sitting there retarded not helping.

      So I am ordering new cables. I was thinking of upgrading to 10 gigabit for like $5,000.- but now I wonder if I will need it since I am not using 1 gigabit yet.

      I just wondered if there was a software lag in Vray slave since it counts down the % of scene file and then Ray packets and stuff; before it starts with texture files. I will assume no? Since it starts the gi pass and buckets while the slave window still claims to be downloading textures.

      Comment


      • #4
        you can start task manager and in networking it will show what speed you are getting on your machine. 100 mb/sec is 1 gig tho. You have to see the language here, 100 Mb/sec = 1 Gigabit network speed. However if in task manager it says 100 Mbits, then you are getting 10 megabytes per second.
        Dmitry Vinnik
        Silhouette Images Inc.
        ShowReel:
        https://www.youtube.com/watch?v=qxSJlvSwAhA
        https://www.linkedin.com/in/dmitry-v...-identity-name

        Comment


        • #5
          Hmm... The plot thickens, blade server is getting speeds of 700mb-900mb up and down with itself.
          must be the workstation cable to networked drive.
          Is ok, drive is full. I just debugged my new fileshare server 120tb will make sure to get gigabit cables instead of using the crap that came with my cable box.
          so you don't know how the slave process handles the texture files?

          I just thought it was weird. The window claiming it was still downloading files an hour later. while the render was clearly working

          Comment


          • #6
            Success!
            yea it was the cable. speed test 700-900mb to networked drive. Still have to test distributed render. Maybe a day I start before noon. Since it takes a bit to debug a scene and network.

            Comment


            • #7
              This might have been it. my 12x Java based remote desktop was sucking 42 of my 100mb of speed.

              Comment


              • #8
                WTF that is quite odd. Sounds like some kind of java exploit? it should not be using any network at all.
                Dmitry Vinnik
                Silhouette Images Inc.
                ShowReel:
                https://www.youtube.com/watch?v=qxSJlvSwAhA
                https://www.linkedin.com/in/dmitry-v...-identity-name

                Comment


                • #9
                  It's a realtime screencast. Well 12 of them, it sounds about right 3 mb/sec

                  Comment


                  • #10
                    I was going to test my network today...
                    so I got a scene debugged, and got the network fired up and debugged, and made an IR map on the comp with 256gb ram. And saved the lot and sent it to the network.

                    You were right total network activity never went above 7mb even with all 12 machines downloading at once on a heavy texture scene. And the scene started rendering even though the slave node still claimed to be downloading textures and hour later.

                    But I can't show you a shnazzy beauty pass because some jerk put their dick in my server, and then it crashed about 2 hours in.
                    Said the IR map was incomplete, crashed Vray slave, and logged out of my user. (That's not supposed to happen). So I ran the scene again with no GI same thing. Crashed an hour in logged out.

                    It's like WTF!?

                    I didn't go to college for 5yrs and spend $17k on a server so you could put your dick in it. And if you are so lucky to track them down. what do they say? "Oh, I thought it was a video game"

                    So... That's what I did today.

                    It doesn't count as I rendered this one on the local... So I can't show you screenshots of how awesome it works now with 2Gigabit. Or do a time comparison, this was about 2.5hrs, the network crashed about an hour in.

                    Comment


                    • #11
                      Sounds like you had some fun. What kind of network switch do you have? It could be that the switch is to blame also. You really have to pay attention to the switch options when they claim to have 1 Gig speeds, it should be able to support 1 Gig per port and not a total (stuff like that)
                      Dmitry Vinnik
                      Silhouette Images Inc.
                      ShowReel:
                      https://www.youtube.com/watch?v=qxSJlvSwAhA
                      https://www.linkedin.com/in/dmitry-v...-identity-name

                      Comment


                      • #12
                        It's a Cisco 2300 series. I tested it. 900mb/sec not quite a gig but a far cry from a meg

                        Comment


                        • #13
                          so when you got a 900 mb/s which test was it that did it? was it on one machine reading from your server?
                          Dmitry Vinnik
                          Silhouette Images Inc.
                          ShowReel:
                          https://www.youtube.com/watch?v=qxSJlvSwAhA
                          https://www.linkedin.com/in/dmitry-v...-identity-name

                          Comment


                          • #14
                            It was a program called: lan speed test lite

                            It showed like 500 write, 7-900 read depending on the machine, blade to blade, blade to server, server to blade. All about the same.

                            I saw ltt YouTube tested this: mickro tik switches

                            Which offers a 24x 10gigabit switch for $500.- us

                            But as I said, when uploading textures for real I barely hit 7mb.

                            The benefit would be writing the near 4gb .exr in a animation after the render. But my server crashed so I didn't get a chance to test it yesterday

                            Comment


                            • #15
                              I had some extra time the other day so i took a look at the new trial for Next, as i am still using 3.6vray.

                              and my network kept dumping nodes because of ram issues, it said it was trying to use 300 gigs of ram and then crash, but the nodes that did render were strangely using No ram, none at all. other observations were that when using a GI from file it would dump the light and only render the GI. it kept complaining about adaptive lighting being incompatible with GI or single lights. and i checked out the 'render selected' in RT mode, again i was disappointed that when it did isolate a polygon it did not render reflections or refraction's of the surrounding polygons. which makes it no different than Maya's own isolate selected polygon. i thought i saw a isolate material function in the commercial but i couldn't find it in the new UI. i imagine it would also not render reflections or refraction's. making it fairly useless when debugging a material.

                              i am skeptical about spending $4200.- to upgrade my network when after 18 months update 1.1 is still showing many of the problems i observed in next beta. making me wonder weather it was the solution for me. as i work mainly in science fiction, my current scene has 7,900 lites with a projected 12,000 at finish. i paid $1,200 to upgrade to 3.6 because i wanted to see what was under the hood with probabilistic lighting... and of 2 yrs now i have still not tested it (adaptive lighting) because of network problems.

                              some of the problems i have with 3.6 is that the security of my wibu keeps locking me out. about 1/3 times it crashes because of security (which is written into the foundation of the software) and every 3 months it prompts me to pay chaos group money for an upgrade, and crashes.

                              i never had these problems with 3.0 which did not have this new security written into the binary code. i could just work all day long. ) love the software keep it up

                              Comment

                              Working...
                              X