Announcement

Collapse
No announcement yet.

Yet another DR workaround?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Yet another DR workaround?

    So I have been through about a million threads and half a million hours of struggling - and it just seems there is no clean and easy solution to some of the DR rendering issues.

    What I'm finding is that running the spawner manually works great and as a service does not (at least not predictably) but when submitting multiple renders through BBurner that nodes occasionally stop responding. Manually quittting and restarting the vraySpawner90.exe works fine to get them back in action.

    I'm now trying to find a software solution that would check for low processor activity and then quit and restart the spawner (effectively: kill 3dsmax.exe and vrayspawner90.exe processes, and then re-launch the spawner90.exe)

    I think this would probably solve most if not all of the DR networking issues I have run into and read about. Is anyone aware of a utility that could do that? I have done some googling but can't seem to find anything that fits the bill, but maybe there is something custom or more esoteric out there?

    Thanks in advance,

    b
    Brett Simms

    www.heavyartillery.com
    e: brett@heavyartillery.com

  • #2
    We have to find the root-problem with DR!

    I think it´s most important to spot the problem why nodes are dropping. What is the reason for this problem. This has to be done by the coders and I would heavily appreciate this!

    All this workarounding is dealing with the symptoms, but we have to find the reason! Any ideas at ChaosGroup?

    I found no workaround to get DR work properly. My actual workaround is to let spawner run as a service and automatically let the nodes restart after stopping the rendering. That ensures that a new clean instance of max is running with the spawner, but i doesn´t prevent nodes from dropping. And my experience: the harder the renderjob, the more nodes drop. That´s fatal: with simple scenes I don´t need DR, but with heavy jobs where DR is needed nodes drop.

    What I´m interested in is: Vlado: do you have an DR-environment that works properly? If yes - how did you do it? What is you network-configuration?

    I think it´s a network-related problem. Seems that the master at some moment loose contact to some nodes. Maybe the amount of nodes involved has also something to do with the problem. It feels like the more nodes are involved the higher is the risk of loosing them all. Maybe the whole problem has to do with the simultaniously communication of one master with many nodes.

    Maybe this in relational: I also have problems with BB where sometimes the slaves loose contact to the manager. "Assuming manager is down. Application is terminating" (the erromessage for BB-server is something like that). The poorer the machine and the heavier the rendering the higher is the risk for this error. I have two types of renderslaves: Some old P4 single core 2.8 Ghz, some Core2duo 2.0 Ghz. If one frame has to render several hours the old P4-machines always fail due to this error.

    Due to the importance of DR I beg for solving these problems. Especially for hard renderjobs in very high resolution it´s a deadline-killer if only one machine has to render for days while it could be done overnight with DR.

    So please, do your very best at ChaosGroup! Thank you.

    Sascha

    Comment


    • #3
      I agree with Sascha- DR is hugely important and once you get beyond single renders launched directly it runs into a ton of problems. Is there anything to be done at the Chaos Group end of things?

      b
      Brett Simms

      www.heavyartillery.com
      e: brett@heavyartillery.com

      Comment


      • #4
        We have to find the root-problem with DR!

        Originally posted by Sascha Selent View Post
        I think it´s most important to spot the problem why nodes are dropping. What is the reason for this problem. This has to be done by the coders and I would heavily appreciate this!

        All this workarounding is dealing with the symptoms, but we have to find the reason! Any ideas at ChaosGroup?

        I found no workaround to get DR work properly. My actual workaround is to let spawner run as a service and automatically let the nodes restart after stopping the rendering. That ensures that a new clean instance of max is running with the spawner, but i doesn´t prevent nodes from dropping. And my experience: the harder the renderjob, the more nodes drop. That´s fatal: with simple scenes I don´t need DR, but with heavy jobs where DR is needed nodes drop.

        What I´m interested in is: Vlado: do you have an DR-environment that works properly? If yes - how did you do it? What is you network-configuration?

        I think it´s a network-related problem. Seems that the master at some moment loose contact to some nodes. Maybe the amount of nodes involved has also something to do with the problem. It feels like the more nodes are involved the higher is the risk of loosing them all. Maybe the whole problem has to do with the simultaniously communication of one master with many nodes.

        Maybe this in relational: I also have problems with BB where sometimes the slaves loose contact to the manager. "Assuming manager is down. Application is terminating" (the erromessage for BB-server is something like that). The poorer the machine and the heavier the rendering the higher is the risk for this error. I have two types of renderslaves: Some old P4 single core 2.8 Ghz, some Core2duo 2.0 Ghz. If one frame has to render several hours the old P4-machines always fail due to this error.

        Due to the importance of DR I beg for solving these problems. Especially for hard renderjobs in very high resolution it´s a deadline-killer if only one machine has to render for days while it could be done overnight with DR.

        So please, do your very best at ChaosGroup! Thank you.

        Sascha
        Vlado, what do you say?

        Sascha

        Comment

        Working...
        X