Announcement

Collapse
No announcement yet.

Distributed rendering MAX 2014 and V-Ray 2.40.4 - random "busy" message

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Distributed rendering MAX 2014 and V-Ray 2.40.4 - random "busy" message

    We are running Max 2014 and V-ray 2.40.4, Windows 7 and 8 64-bit.

    This appears to affect Windows 7 and 8 users.

    Our users are experiencing random "busy" messages from DR machines. It seems to be random as to which machines throw the error.
    As a rule each person is allotted 3 DR machines to use. We typically wait a minute or so before rendering to ensure each machine loaded the spawner and so on.
    You may get 1, 2 or none of them. We are using Deadline and its DR tool but this also happens with Deadline off and directly rendering to the machines with vray only.

    Are there any network specific concerns/configuration issues known to cause problems with vray DR?
    Could something be happening too fast or slow with regards to the network to cause this? Any protocols or such? The "busy" message is pretty much instant when you start the job of course.

    The problem being we recently started using Deadline and we have a new IT guy who has been changing tons of things on the network.
    Being the problem exists when deadline is not running eliminates it from the suspect pool for me.

    We have double checked that no two (or more) users are trying to DR to the same machines and no one sending DR jobs to the render queue.

    Any suggestions here would be great.
    "It's the rebels sir....They're here..."

  • #2
    Hey Dman,
    Here are some things to check that have hindered our DR setup with a larger company. Most of the issues that I have experienced are when the company roles out updates to security policies which updates firewall communications and basically Vray communications between the Master and Slave node. I have had to setup the full 8 machines at different times for various reason so hopefully some of these tips help, especially given you have a new IT guy that may not be aware of the flow on affect of some changes.

    Check you can Ping the machines across the network outside of max and Vray, just using DOS prompt.
    Turn OFF the entire firewall on the render nodes and run a DR test and see if it goes through. If it does then you issue is to do with "allowing a program through the network" or creating windows firewall exemptions. I just set up our farm again once moving to vray 3.0 from 2.4 and NONE of the node picked up until I created the right exemptions. While the default Vray install does do a good job of attempting to create the correct exemptions and instating the drSpawner and RTserver as a windows service it is not bulletproof.

    If this issue is firewall related and all of the DR nodes are responding correctly with the firewalls turns OFF then check the and create the following steps

    Make sure that any apps that Vray and Max needs to communicate are allowed through the firewall. Go to Start Menu, Type "allow a program through the firewall" and make sure that Max and anything vray related are allowed on your local and domain networks by ticking the box. In most cases checking through this step on the master and render nodes should fix the issue. However in the case of the company that I work in 3000+ people IT roles our global policy updates and overrides which either dont allow the normal user to change the firewall settings or IF i could change it it would revert back to the stardard firewall setting after 2 hours.

    So from here you need to get your IT guy to exempt your machines from any such policy updates (if that is operating) or just get them to create Ingoing and outgoing firewall rules like the ones shown here

    Go start menu, type "firewall", go to the advanced link on the LHS and change the ingoing and outgoing rules



    Vray 2.4 32bit - used to use the port 30304 to communicate
    Vray 3.0 now uses 64bit port 20204
    Click image for larger version

Name:	Capture.JPG
Views:	1
Size:	180.4 KB
ID:	852208
    Make sure these are allowed in both the incoming an outgoing connections on the render farm node

    let me know if you need nay further help on this. I hope this helps

    thanks
    J

    Comment


    • #3
      Thanks for your time and will do. I cannot say for sure how he has any of the policies set up. One frustrating thing is that it seems intermittent. Where one machine craps out then another time it works. I think the permanent farm count is around 48 machines and ~12 workstations.

      We have also put just about everything (for whatever reason) on virtual servers with regards to the license managers of various programs including VRay. Its a bang on about redundancy etc but I have no idea what if any issues it may cause. Again when and which ones are or are not there seems very random. Everyone gets a license, you can render locally but its wild west with the frigging DR's.

      I just remoted in while no one is there and had the same problem. It seems like there is some sort of variable lag. For example. I DR to 4 machines. 3 of them pick up and the 4th says its busy, all the machines just showed as idle. Let it render a few minutes. Cancel the render. Wait about a minute and try again. 3 of them report busy. Cancel and wait several minutes, try to DR and two of them report busy. Try 4 or 10 other machines, same issue. There never seems to be anytime where all of them work. Our licenses are good and should easily support that many machines. But I will triple check the dongle again and double check with IT about any firewalls or the like to be sure.

      It is a bit of "one of these things is not like the others" game at the moment. Before Deadline and the network changes no problem. I cant imagine it has anything to do with deadline. We also had assigned specific machines to specific users so no one would cross over when DR'ing and gank another users resources.

      Tomorrow will check with IT then isolate a group of machines again and with and without deadline and then noodle IT guy to death.
      Come to think of it he has a thing for security and defending us from techno ninjas and kaiju attacks, no telling what is going on in there.
      Last edited by Dman3d; 18-06-2014, 05:50 PM.
      "It's the rebels sir....They're here..."

      Comment


      • #4
        Hey Dman,
        ok so from what I can see the firewall side of things should be the issue as they all seem to be communicated fine with the licence server.
        As a check though jump on to the licence server and run the http://localhost:20204 check and get the server status to see exactly how many licences are being tied up during a DR test across all the machines. That is a lot of machines (totally jealous).

        With Vray DR I have had so many issues of renders being stopped because of missing assets or even a certain computer not being able access those assest over the network. This is even after all of the max preset paths are saved correctly via the scene and "should" have transferred correctly.
        Jumping from Vray 2.4 to Vray 3.0 licences I have found that our DR has been much more stable, as you can now have the option to transfer the missing scene textures with the scenes
        Click image for larger version

Name:	2014-06-19 12_57_37-Render Setup_ V-Ray Adv 3.00.07.jpg
Views:	1
Size:	32.6 KB
ID:	852209

        Even with this option I have still found one or two nodes that are still playing up a bit in terms of not trasnferring all the scene assets on a couple of nodes, so I too will have to go back and check the firewall settings and max user paths to make sure everything is correct.

        One other thing to question (as I have never had the fortune of having that many render nodes) what is your network load like when you DR across the whole farm? expecially considering you have 12 workstations / users. It this problem worse suring hte day when everyone is putting load on the network and network drives where you assets are store, compared to at night when everyone has gone home? For example if you have 40 machines all trying to read and load say 40 high res image maps, hdri's and proxy assets form a network path, what is that doing to the load on your internal network, is it peaking out? also how does the

        You mentioned that it is not consistent across certain machines so there is probably no point in isolating just one machine and sending only the DR to that machine via the Vray DR settings tab. I have used that in the past to help point out if say one machine hasn't been setup with forest or railclone correctly.
        Some times if I have a dr node playing up that had been working I actually open the scene on that local machine to see ifMax starts up and comes in with any issues or missing maps in the asset tracker or project manager. More often than not it will come up witht he random error of needing to add the forest or railclone maps to the default user paths again, so for some reason with Network logins etc it seems to be randomly loosing it's Max plugins and user paths setting, not sure why. But once I open the scene on the suspect node, make the change and exit that node seems to pick up.


        Ok one last thought springs to mind, but this is more running a standard configuration of 3ds Max and Vray. Vray (if the option is ticked) saves the DR hosts with the scene. If one or two of your members open that scene at the same time then effectively they are both going to be pointing to the same render nodes to pick up and use in DR. Whoever gets to the first will get hte use of that node and the second user will come up with the node "Busy" message". I have run into this problem a few times where someone else will open an old version o f a scene that saved the DR's with it. However I am not sure if that should happen in the case of using a specialist DR software. And you did mention that the comp wasn't at load at all, so that still sound like it is pointing to a latency issue perhaps.

        Have you tried looking at the verbose log at all in the ?this has much more in depth info that the normal Vray log

        Comment


        • #5
          Also on the machine that you remote into, jsut go in Unregister the Vray DR spawner and Vr RT spwawner, exit the DR spawner with the icon on the task try, and then reregister and restart the service again, that has also help and you should be able to tell straight away in the vray log on your workstatin machine, whether that node has picked up straight away.

          J

          Comment


          • #6
            The most likely reason for this issue is that the spawners are still not ready to receive a new job.
            Even if the rendering is already finished vray-spawner needs some time to finish some things like freeing memory and etc, depending on the size of the scene this time might vary.
            The best option would be to turn on Restart Servers At Render End checkbox from Distributed Rendering options - with that option checked Vray will automatically restart the spawner after each completed job and the next one should be received without any issues.

            Let us know if that helps or not.
            Last edited by svetlozar.draganov; 19-06-2014, 01:03 AM.
            Svetlozar Draganov | Senior Manager 3D Support | contact us
            Chaos & Enscape & Cylindo are now one!

            Comment


            • #7
              Ok checking into all of this.

              Svetlozar - this also happens initially, the first time you hit render. I had counted on the unloading/loading. I will re-try by incrementing the length of time between hitting render. We have been doing this for years, its only recently this has become such an issue. I suppose the below could be the issue also.

              I suppose another scenario is there are several machines in the mix with some sort of problem. That one or two dont load so the user hits cancel then render again, nothing has had time to unload and all show busy. They cancel, wait, hit render then all but the original bad one runs giving the impression that more it wrong than simply waiting for an unload.

              I think I will have to wait until this weekend to test when no one is around that way I can see if it is indeed specific machines and its not as random as it appears.

              followup:
              Checked the dongle and we have 220 2.4 dr licenses and only about 10-11 in use at any given time.
              Save nodes with scene is NOT checked.
              Machines can see server.
              Tried with and without deadline being loaded, direct vrayspawner with nothing between.
              I have tried waiting in 5-10-15 minutes intervals all with the same problem, with or without restart on render end checked.
              Last edited by Dman3d; 19-06-2014, 10:27 AM.
              "It's the rebels sir....They're here..."

              Comment


              • #8
                Would you please send us vraylog.txt file from one of the problematic render nodes right after the issue appears.
                The file is located in Windows temp folder by default.

                Thank you.
                Technical Support
                Chaos Group

                Comment

                Working...
                X