Announcement

Collapse
No announcement yet.

DR and fast networking

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • DR and fast networking

    anyone who saw my previous post regarding my new workstation will know im experimenting with 10Gb infiniband interconnects (point to point beween 2 machines at the moment, if its reliable then ill get a switch. )

    i am now regularly maxing out the SSD in each test machine sending over 500MB/sec using the IP over Infiniband protocol. (IPoIB)


    my initial tests with DR seemed promising, but ive just tested it with an extremely heavy scene and im getting some odd results.

    network monitor shows when preparing to render, i get no more than 0.3% network utilisation, and the host machine finishes all the lighting calculations before the slave kicks in.

    now since both machines are pulling gigabytes of proxies and maps from a standard mechanical HDD, id expect some bottlenecks there, but 3MB/sec transfer rate is a bit embarrasing.

    now considering that rate, it actually started rendering surprisingly quickly (after about 5-6 mins)

    but i dont really understand how the DR system sends and receives data. surely a large part of it would go directly from the memory of the host to the slave and vice versa (for example the buckets being rendered, the imap data etc..) and in the case of that data, id expect to max out my network connection, at least momentarily..?

    however the most i saw even after starting the render was 2% usage ( 20 MB /sec) and now its right in amongst the buckets, the network usage is around 0.01%.

    basically i have no idea if a) its the whole "infiniband" thing thats causing problems, or the HDD im streaming from (could this be improved? i.e. load the data for the host machine, then send directly from ram to the slaves? they would kick in a lot faster in this case..) or maybe its just the way DR works and ill never see it utilising a fast network connection. in which case a real shame, as i had hoped a fast pipe between the machines would bring me closer to the responsiveness of a dual cpu machine.

    i understand DR is designed to avoid network bottlenecks but it would be nice if i could adjust some settings to take advantage of my fat pipe - at those speeds id get just as good a result over my adsl connection.....

  • #2
    quick update, just to eliminate the HDD as a bottleneck i copied everything to a temp folder on my main ssd and mapped it as a drive on the slave. this is the drive that i can get 500MB/sec from over the connection. i remapped all the paths to point to this new folder.

    another test and i still only get 0.3% network utilisation while its sending the scene to the slave, according to task manager. basically its made no difference at all to the time taken for the slave to pick up the job. i did notice around the time the vray log noted "rendering scene started" for the slave, i got one spike up to 8% network usage, but that was the high point. something odd happening here..

    Comment


    • #3
      ok stranger and stranger. i copied the whole job to an identical folder on the c drive of the slave, and pointed the mapping to the local folder. (took 6 seconds to copy the whole lot over)

      still takes almost the same time to start, im guessing because it copies the max file over before opening it, but id have expected some speedup as it would pull the maps from the local drive? seemed to make very little difference.

      also, and possibly related, the master and slave are identical spec machines, with identical software setups. however the master is rendering with 100% cpu usage, but the slave is bumbling along between 8% and 100, varying wildly and actually not helping a great deal with rendertimes.

      what could that be? as i said the maps are all stored locally, the machines are identical, and this is in the middle of a half hour render, when the whole scene appears to be loaded. ive got bags of free ram, despite dyn. mem limit being set to 0.

      its a masterplan model with a load of forestpack stuff in there, and a load of proxies. any suggestions greatly appreciated.
      Last edited by super gnu; 24-09-2012, 11:22 AM.

      Comment


      • #4
        Are you using LC as your second bounce??? If you are not using LC then just ignore my info.
        I have noticed that it is almost pointless to render a scene with DR without the LC already done & saved locally.
        For what ever reason the slaves just take forever & don't do anything much until way after the LC has finished.
        Try doing your LC first on your local machine & then render it with a from file LC map. When i do it that way the slaves load & render super quick.
        Someone correct me if I am wrong but I read that LC can't use DR anyway so it's sort of pointless to use DR without a pre-loaded Light Cache as it just slows things down.

        Hope this helps

        Comment


        • #5
          yes i am using LC.. but i often do, and as long as you figure in the time it takes the LC to calculate on the slave (it does the whole thing on each node, before they will start rendering, and since they start later than the master, this causes a delay)

          in this case im pretty sure this isnt causing my issues.. the slowdowns and lack of cpu usage are way after the lighting calc are finishes and both machines are well stuck into the main render.. and the scene transfer using minimal network bandwith is before any calculations start.

          Comment


          • #6
            I'm not sure if there is a problem here; once the initial scene is transferred (and this does not include any textures or other assets, just the .max file which is read from disk, not RAM), and when the render buckets start rendering, depending on the time it takes to render a bucket, the network load will be pretty minimal if each bucket takes sufficiently long time.

            In any case, it would make more sense to compare render times on a single machine vs DR, with the infiniband and regular ethernet.

            Best regards,
            Vlado
            I only act like I know everything, Rogers.

            Comment


            • #7
              hm.. well in my case even when transferring the max file ( its over 600 megabytes) i still get a transfer rate of around 3-5 MB/sec, from a drive that is capable of 550 MB/sec. this is the first issue. second one, is the fact that at no time during the render do i get much higher than this, despite pulling the 3 gb of maps and proxies from the same drive. (and as further tests showed, strangely i seem to get no benefit from having them on the local drive either)

              the final and perhaps more pressing issue is why the slave is only running a very intermittent cpu load throughout the whole render. its as if the dynamic memory limit is too low or it was data starved somehow. this is with the maps pulled over the network -or- with them mapped to the local drive.

              as a comparison a net render through deadline pegs the cpu on both machines, and they both start in roughly identical times, despite the job being submitted locally from one of them.

              Comment


              • #8
                Originally posted by super gnu View Post
                the final and perhaps more pressing issue is why the slave is only running a very intermittent cpu load throughout the whole render. its as if the dynamic memory limit is too low or it was data starved somehow.
                Do you get this with regular ethernet connection?

                Best regards,
                Vlado
                I only act like I know everything, Rogers.

                Comment


                • #9
                  this will be a touch difficult to test, as i fried the ethernet port on my motherboard in a thunderstorm.. this was the spur for the whole experiment..! i intend to get a gigabit card to pop in there soon, so ill test then.

                  Comment


                  • #10
                    ok so im bumping this as i still have no luck with my 10gb infiniband and DR. im rendering a massive scene and doing the imap calc via DR with one other machine is illustrating some odd limitations.

                    when i first hit render, i can see it sending the scene. and it seems capped at 0.4% network utilisation.. thats 4mb/sec transfer rate. once the file has sent, it then appears to pull a chunk of stuff (resources i presume) and then it always hits peaks of 12%, averaging around 8-9% so thats a max of 100meg a sec. then when its sending the imap data between machines at the end of each frame, its again capped at 0.4%.. and after 50 frames of imap calc, and a 1gb imap, its taking forever to sycronise the imap and i wait half a frame for the slave to kick in and help. all the data is stored on ssds on each machine, apart from a few maps.

                    by contrast, as ive mentioned before, copying 12gb of files from the ssd of one to the same in the other, i regularly get 500mb/sec.. basically the max of the ssds, and 50% network utilisation.

                    i cannot at the moment compare to gigabit ethernet as i dont have an ethernet port on one board..

                    Comment


                    • #11
                      have you tested it out with either irrad/brute force or bf/bf?

                      Comment


                      • #12
                        yes i get the same slow data transfer...

                        Comment


                        • #13
                          I have no idea what might be causing it; I would need to sit down and debug the code to see what's going on, but for that we need infiniband here, and I don't know if/when this will happen.

                          Best regards,
                          Vlado
                          I only act like I know everything, Rogers.

                          Comment


                          • #14
                            yes it would possibly be quite exaggerated customer service if you wired up a new network just to test.. however, it is by far the cheapest high speed networking method.. i picked up my infiniband cards for 35 euros each on ebay, the cable was more expensive than the cards.. 45 euro. compare that to 10gig ethernet where you cant get a card for less than £350, and usually nearer £500-£600


                            is there any difference in the code that sends and recieves the scene and irradiance data, to the code that sends the textures etc? that could be a place to start, as i get much faster transfer rate (still 5x slower than manual file transfers, but 20x faster than scene/imap transfer) in that stage..

                            Comment


                            • #15
                              Originally posted by super gnu View Post
                              is there any difference in the code that sends and recieves the scene and irradiance data, to the code that sends the textures etc?
                              Yes, there is... but I don't know which part is a problem

                              Best regards,
                              Vlado
                              I only act like I know everything, Rogers.

                              Comment

                              Working...
                              X