Announcement

**jskt** · 14-09-2018, 08:52 AM

Update: I think I fixed it.

I found a forum post (but NOT documentation) that a specified master node needs to have its own configuration set to "Auto-detect". I think this qualifies as a bug, but at the very least should be documented and probably should be caught by the UI and explained to the user.

Once I went to every single node and set them to a specific machine to be the master, they all agreed about what to do and the black buckets stopped.

I do think this problem points to a flaw in how Swarm nodes conduct elections when in auto-detect mode, however.

**ivan.slavchev** · 19-09-2018, 08:02 AM

Thank you for the note, I'll review the documentation and add some useful info to it. In your situation, several things might have caused an issue - a node in the network with network issues can mess the auto - detect mechanism. Also - nodes in different networks are not visible by default since the TTL for the UDP packets is set to 1 by default (can be adjusted).

**jskt** · 20-09-2018, 07:03 AM

When the Swarm service stopped on the Master machine yesterday (not sure why; I was out of the office); everything went right back to all the servers electing themselves to be in charge again.

Originally posted by ivan.slavchev View Post

a node in the network with network issues can mess the auto - detect mechanism.

How would one go about diagnosing this? As far as I know, everything is working as expected on our LAN.

Some of the servers have multiple NICs, each with its own IP. Would that confuse Swarm?

Originally posted by ivan.slavchev View Post

Also - nodes in different networks are not visible by default since the TTL for the UDP packets is set to 1 by default (can be adjusted).

All machines are in the same subnet. Does TTL need to be set to 2 anyway?

**jskt** · 20-09-2018, 07:39 AM

One more question - a machine I am trying to use as a non-rendering Master node has two NICs on different subnets. In its swarm configuration it lists the wrong IP as its address, and it appears the other nodes can't see it. How can I force Swarm to use a particular local IP to listen/talk on?

**ivan.slavchev** · 20-09-2018, 08:44 AM

Originally posted by jskt View Post

When the Swarm service stopped on the Master machine yesterday (not sure why; I was out of the office); everything went right back to all the servers electing themselves to be in charge again.

How would one go about diagnosing this?

SWARM has a log located in "%APPDATA%\Chaos Group\vray-swarm\work\vray-swarm\vray-swarm.log" . If the service is running with a local system account the log will be located in C:\Windows\System32\config\systemprofile\AppData\R oaming\Chaos Group\vray-swarm\work\vray-swarm
Check for error messages there. Also - the above means that if you change the user running the service SWARM will create a new folder in the new user's %APPDATA% with default settings e.g. Auto discovery instead of a predefined master node.

Originally posted by jskt

Some of the servers have multiple NICs, each with its own IP. Would that confuse Swarm?

It might, SWARM by default will pick the interface with the highest metric - e.g. the first one listed in ipconfig /all output

Originally posted by jskt

All machines are on the same subnet. Does TTL need to be set to 2 anyway?

No, you can leave it to 1

Originally posted by jskt

A machine I am trying to use as a non-rendering Master node has two NICs on different subnets. In its swarm configuration it lists the wrong IP as its address, and it appears the other nodes can't see it. How can I force Swarm to use a particular local IP to listen/talk on?

You can set it in "C:\Program Files\Chaos Group\V-Ray Swarm\config.yaml" file on the machine. To do that - add interface: 'Interface Name' as shown below, the word "interface" should start on the same line as "port" above it. After setting the value - restart V-Ray Swarm service.

Code:

network:
  port: 24267
[B]  interface: 'Ethernet0'[/B]
discover:
  autoDiscover: true
  masterNodes:
    - ""
    - ""
    - ""
  ttl: 1
logger:
  level: info
firstRun:
  configFilePath:

**jskt** · 20-09-2018, 01:15 PM

Thanks for the detailed reply. It's now working.

Can you verify that both the primary and fallback master nodes should be set to Auto-detect?

Also, swarm was not writing logs to any of the places you listed - it was running as LocalSystem and the folder you mentioned did not exist, and when I cahnged it to a domain user with admin rights, the folder was not created in their appdata.

For the sanity of future searchers:

1. The formatting of the config file is extremely picky - each entry must have the proper number of leading and trailing spaces to create indentation as shown in Ivan's example.
2. When you open the file in Notepad it will appear as one long line - use Notepad++ to see line breaks
3. If your interface name has spaces in it, you need double quotes
4. The CMD command to get the correct interface name in a nice copy-pasteable format is: getmac /FO csv /V

**ivan.slavchev** · 21-09-2018, 05:18 AM

Originally posted by jskt View Post

Thanks for the detailed reply. It's now working.

Can you verify that both the primary and fallback master nodes should be set to Auto-detect?

Auto-detect is recommended in case there aren't network issues - for example a forbidden UDP multicasts. If there are - you can set the Master nodes to be masters one to another. The most important thing is NOT to assign a node to be a master to itself.

Originally posted by jskt View Post

Also, swarm was not writing logs to any of the places you listed - it was running as LocalSystem and the folder you mentioned did not exist, and when I changed it to a domain user with admin rights, the folder was not created in their appdata.

Usually a restart of the service fixes this. The log start to be written when a certain instance gets a build (first time a render job is submitted).

Originally posted by jskt View Post

1. The formatting of the config file is extremely picky - each entry must have the proper number of leading and trailing spaces to create indentation as shown in Ivan's example.

Unfortunately, .yml files are that sensitive

Originally posted by jskt View Post

2. When you open the file in Notepad it will appear as one long line - use Notepad++ to see line breaks

WordPad might work too, but Notepad++ is much nicer to work with.

Originally posted by jskt View Post

3. If your interface name has spaces in it, you need double quotes

That's a nice addition.

Originally posted by jskt View Post

4. The CMD command to get the correct interface name in a nice copy-pasteable format is: getmac /FO csv /V

This command in Powershell does a nice job too

Code:

Get-Netadapter

As a whole - thanks for initiating the discussion. I'm thinking of wrap that up and placing it in the docs, hopefully there will be enaugh time next week.

**jskt** · 24-09-2018, 07:52 AM

Originally posted by ivan.slavchev View Post

Usually a restart of the service fixes this. The log start to be written when a certain instance gets a build (first time a render job is submitted).

In the case where you want a machine to be a non-rendering Swarm Master (for example if it's your Deadline repository machine, your license server, your PDC, your Citrix server, etc.), this means logs will never be generated.

More broadly, if network issues prevent renders from starting, it sounds like logs won't ever be generated.

Can one add an argument to the service startup to make it start logging immediately?

**ivan.slavchev** · 26-09-2018, 01:23 AM

In case the Master node doesn't render at all - you can check C:\Windows\System32\config\systemprofile\AppData\R oaming\Chaos Group\vray-swarm\work\vray-swarm\service-controller\service-controller.log"
It logs only the service events, not the V-Ray's instance behavior.

Announcement

Swarm: Too many master nodes and black buckets

Swarm: Too many master nodes and black buckets

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment