Announcement

Collapse
No announcement yet.

V-Ray 7: Swarm 2 setup issue: "Too Many Origin Nodes"

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • V-Ray 7: Swarm 2 setup issue: "Too Many Origin Nodes"

    Hello,

    I’ve been assisting a user (giulio_perosino​) privately with a Swarm setup issue, and I’d like to continue the discussion here for visibility and further support from the development team.
    Here’s a summary of the situation so far:

    The user initially encountered a "Unable to communicate with ULA" error on the Swarm admin page.
    Running the origin.bat script resolved this, but it introduced a new error: "Too many origin nodes."

    Steps already taken:
    1. Ran origin.bat to fix the ULA communication issue.
    2. Stopped Swarm, deleted %TEMP%\swarm.state, and restarted Swarm. The "Too many origin nodes" error persisted.
    3. Checked the logs, which showed two IPs, considered origin nodes
    Further investigation revealed that the second IP is somehow associated with the same machine (Windows 11).

    Regards,
    Konstantin

  • #2
    giulio_perosino Hello Giulio!

    Would you post me ALL your log files from %ProgramData%\Chaos\V-Ray\Swarm 2\logs​ (I suppose you can have swarm.log, swarm1.log, swarm2.log, ....) in some archive/zip? Also, please, post me output of http://localhost:1113/debug/ptbl and http://localhost:1113/debug/ptblText (you can save it with CTRL-S or "Save as..").

    Also I have couple of questions about your setup. Do you have a virtual machine on this PC? If yes, did you try to run Swarm on the host machine and on the guest machine (inside virtual one)? If yes, the best solution is to run the origin instance on a separate physical machine, because virtual machine is completely isolated environment and it can lead to some problems.

    Best regards, Pavel

    Comment


    • #3

      Hi, I'm experiencing some issues with Swarm and need your help.

      I encountered the message "Unable to communicate with ULA" on the admin page while trying to resolve a issue with Swarm.

      Steps Taken:
      1. Downloaded and installed l the official V-Ray 7 and follow the Swarm setup guide, which includes configuring the origin with an executable script.
      2. After running the origin.bat script, I received a new error: "Too many origin nodes."
      3. I followed these steps:
        • Stopped Swarm on the origin node.
        • Deleted the %TEMP%\swarm.state file.
        • Restarted Swarm. Unfortunately, the error persisted.

      The logs swarm.zip (contains my attempts from yesterday) you find attached indicates two origin nodes (192.168.11.1 and 172.31.0.1). It turns out the second origin node is my WSL vEthernet adapter on the same machine. You can also find the ptbl.zip and ptblText.zip


      Originally posted by pavel_yosifov View Post
      Also I have couple of questions about your setup. Do you have a virtual machine on this PC? If yes, did you try to run Swarm on the host machine and on the guest machine (inside virtual one)?
      I have some virtual machines software (WSL, Docker, VMWare, Windows Sandbox) installed on this PC but I'm trying to run Swarm from the Windows 11 host machine. I installed the Swarm software on other two separate machines and when I reach the web UI from one of them I see the error "Too many origin nodes".
      I hope this info helps you troubleshoot, thanks for the support!

      EDIT: I run successfully the previous version before updating of Swarm and be able to distribute renderings. My internal network didn't change.
      Attached Files
      Last edited by giulio_perosino; 11-12-2024, 01:31 AM.

      Comment


      • #4
        Hi again!

        So, as I got, you run
        Code:
        origin.bat
        on the host Windows 11 machine which has several virtual machines. What happens if you run the origin.bat on ONE (any) of those additional (separate) machines (sure origin.bat on the host Windows 11 should be stopped/killed first) ?

        PS. Unfortunately logs and ptbl.zip, ptblText.zip don't match each other by timestamps and log is very short. If you can, could you please collect log for longer period, maybe 10-15 minutes when all machines run swarm and you have the situation with "Multiple Origins" and after it to create (save) again /debug/ptbl, /debug/ptblText and these, long logs?

        And if it's possible, additionally to logs and /debug/ptbl and /debug/ptblText, if it's possible to send us screenshots of 192.168.11.1:1333/network (Network Page) and the same for 172.31.0.1:1333/network, please
        Last edited by pavel_yosifov; 11-12-2024, 04:26 AM.

        Comment


        • #5
          Hello,

          This is my current setup:
          • 192.168.20.47: Physical workstation with V-Ray for Rhino, Swarm, and License Server installed (origin node)
          • 192.168.20.14: Physical workstation with V-Ray for Rhino and Swarm installed
          • 192.168.20.98: Physical workstation with V-Ray for Rhino and Swarm installed

          After installing Swarm on the "47" workstation, I encountered the "Unable to communicate with ULA" error. I resolved this issue by running the origin.bat script.

          However, when I start the Swarm UI on the other two PCs, I see the error "Too many origin nodes."

          Originally posted by pavel_yosifov View Post
          What happens if you run the origin.bat on ONE (any) of those additional (separate) machines (make sure origin.bat on the host Windows 11 is stopped/killed first)?
          I don't understand what you mean by running the script on VMs. I cannot run the script on the virtual machines because these VMs are turned off or just VM software like VMWare.

          I also want to point out that the logs show many IPs belonging to the origin node workstation (192.168.20.47). For example, 172.21.128.1 is the Ethernet adapter vEthernet (WSL (Hyper-V firewall)), which changes every time I start the PC but gets caught in the Swarm setup.

          I attached the logs you asked, I hope this helps.

          Thanks!
          Attached Files

          Comment


          • #6
            Aha, thanks for the logs!

            I don't understand what you mean by running the script on VMs. I cannot run the script on the virtual machines because...
            I mean on one of 192.168.20.14 or 192.168.20.98.

            I also want to point out that the logs show many IPs belonging to the origin node workstation (192.168.20.47). For example, 172.21.128.1 is the Ethernet adapter vEthernet (WSL (Hyper-V firewall)), which changes every time I start the PC but gets caught in the Swarm setup
            It's OK

            I see in the sent file ptblText.txt:
            Code:
            PEER 1:1 DATA (2)
            --------------------------------------------------
            * 490638e3-abd1-4684-b77c-671a7987ce9b: worker+adminPanel STBL 4906..ce9b pexHash:2d148677025c49c441a1f1704b0eb5f6459881725f
            50ee6db1e79d4a86858db7
            * 8ffe516a-30e6-46ef-af73-b676dd25f6db: DCC+worker+adminPanel+origin STBL 8ffe..f6db pexHash:49882798e1b61a09f54b9a8c0adfb77
            f4c404e85d25d86e30648bcc886cbe507​
            which means that 192.168.20.47 (its Guid is 8ffe516a-30e6-46ef-af73-b676dd25f6db) is only origin. So, current conclusion is that 192.168.20.47 see only 1 origin node which is expected.

            Currently, I can suggest that one of 192.168.20.14 or 192.168.20.98 (or both!) see one more wrong origin. We can deal here in the next way:
            • to stop these 2 PCs
            • to uninstall Swarm 2 there completely (%ProgramData%\Chaos\V-Ray\Swarm 2\data​\swarm.state file should not exist after it)
            • to restart them
            • to install Swarm 2 again on these 2 PCs (BTW, we have a standalone version of Swarm2 here: https://download.chaos.com/downloads...lone-2015-free ie, a version that can be started from a console: unzip -> start in a console)
            If it's possible to send me logs and /debug/ptblText (after 5 minutes working as they detect "Multiple Origin") from these 2 PCs, it will be amazing!!!

            PS. You sent me once a log file with turned on DEBUG level, do you know how to do it? If no and if you can/want, you can change all "level: INFO" to "level: DEBUG" in settings files (they are located in %SystemRoot%\System32\config\systemprofile\AppData \Roaming\Chaos\V-Ray\Swarm 2 - when you run Swarm AS A SERVICE and in the folder with the binary when you run it as a standalone application: %ProgramFiles%\Chaos\V-Ray\Swarm 2. However, if you run Swarm2 as origin with origin.bat, then the settings file is origin-swarm.yaml in the folder of the origin.bat).

            BTW, you can easy prepare the archive with logs from the Web UI: About > Troubleshooting > Collect Diagnostic Data (blue button)

            Comment


            • #7
              Okay, I think I made it, sort of.

              I completely uninstalled Swarm 2 from the different PCs, then reinstalled it and changed all "level: INFO" to "level: DEBUG" to export complete logs. When I ran the Swarm UI on each PC, I saw the "No origin node" error.

              I then ran origin.bat as an administrator on the origin PC (192.168.20.47), and the error disappeared. In the UI, I can see the PCs with a green icon and no "Too many origin nodes" error!

              I started a rendering in Rhino, and it successfully downloaded the correct standalone V-Ray version on the other PCs. However, it cannot start the distributed rendering because the state in the UI gets stuck on "Reserved." In the V-Ray log, I see this:
              Code:
              [14:21:51.730] Render host 192.168.20.14:20212 added to waiting list
              [14:21:51.730] Render host 192.168.20.8:20212 added to waiting list
              [14:21:56.739] Render host 192.168.20.98:20212 added to waiting list

              You can find the logs attached.
              Attached Files

              Comment


              • #8
                Oh, good news!

                And thanks you for logs !!

                I will need some time to analyze them and maybe to consult with colleagues​ from V-Ray team, and then I will notify you.

                Best regards and have a nice day,
                Pavel

                Comment


                • #9
                  Hello, Giulio!
                  Analyzing swarm-logs (192.168.20.9.zip logs, I see this timeline:

                  Code:
                             VRay state          Swarm Worker state
                  -------------------------------------------------
                  14:18:30 : StoppedMVS          FreeSWS
                  14:18:35 :                     RendezvousSWS
                  14:18:40 :                     ReservedSWS
                  14:18:42 : DownloadingMVS
                  14:19:35 : StoppedMVS
                  14:19:38 : StartingMVS
                  14:19:40 : StartedMVS
                  14:21:01 <<< 192.168.20.47 sent StopDVC! >>>
                  14:21:03 : StoppedMVS          FreeSWS
                  14:21:24 :                     RendezvousSWS
                  14:21:29 :                     ReservedSWS
                  14:21:31 : StartingMVS
                  14:21:34 : StartedMVS
                  14:24:34 : StoppedMVS          FreeSWS​
                  And I have found that VRay AppSDK 7 really was downloaded and started, but in 14:21:01 it was stopped from 192.168.20.47. I can suppose that you waited for something to happen, it was in "Reserved" state, then you stopped DCC (closed its window?) and it stopped V-Ray. Or?
                  VRay log also reports similar timeline:

                  Code:
                  .....
                  [2024/Dec/11|14:19:39] [INFO] Entering server mode - waiting for render requests on port 20212.
                  [2024/Dec/11|14:21:01] [INFO] Stopping server.
                  [2024/Dec/11|14:21:01] [INFO] Exiting server mode.​
                  ...


                  But there is something strange in VRay log as well: it starts again after it, I suppose it was retry...

                  We can try 2 things:
                  1. To repeat the same but to wait longer
                  2. To try to run VRay manually - maybe it fails and this is the reason why VRay did not start the rendering (the cause can be some problem in network, or a bug)
                  To run VRay manually (so to check if VRay executable caused the problem) you can run VRay on worker/render side and then VRay on DCC side (which will send the real work to the worker/render), and the syntax is:

                  Worker/render side (if IP as I see from the log is 192.168.20.47 in the case, it will be used as <worker-IP> in the DCC-side command below):
                  Code:
                  vray.exe -server -portNumber=20204 -verboseLevel=3
                  DCC side:

                  Code:
                  vray.exe -distributed=2 -renderHost="<worker-IP>" -sceneFile="<path-to-scene>" -display=1 -portNumber=20204
                  where worker-IP is the IP of the render/worker (synonyms), ie, 192.168.20.47 - no DCC here.

                  You can find vray.exe in the case of the render/worker on Windows %ProgramData%\Chaos\V-Ray\Swarm 2\data​\ (in your case ...\data\VRay\2458\... - somewhere here should be found). In the case of DCC side, it's somewhere in Program Files\Chaos...\<SketchUp-OR-similar>\... It can be found with Windows search tool in Explorer of Windows.

                  So, you run VRay on the render/worker PC, it waits for a job, then you run it on the DCC PC and the output of both VRay-s is the subject of interest.

                  Detailed help on VRay options is available here: https://docs.chaos.com/display/VNS/V...d+Line+Options

                  PS. Meanwhile I will continue to analyze logs for something strange...

                  Best regards,
                  Pavel.
                  Last edited by pavel_yosifov; 12-12-2024, 04:06 AM.

                  Comment


                  • #10
                    Hello pavel_yosifov, thanks for the support!

                    I repeated the same process as before and let it run all night. I collected the logs, but the zip files are around 200 MB each. You can find them on my personal OneDrive at this link: vray-distributed-rendering-logs

                    I also tried running V-Ray manually on two PCs: 192.168.20.47 as my render/worker and 192.168.20.98 as the node.

                    On the worker/render PC, I ran this command:
                    Code:
                    V-Ray for Rhinoceros\vrayappsdk\bin
                    ❯ .\vray.exe -distributed=2 -renderHost="192.168.20.98" -sceneFile="<path-to-scene>" -display=1 -portNumber=20204
                    On the node I ran:
                    Code:
                    C:\ProgramData\Chaos\V-Ray\Swarm 2\data\VRay\24528\bin> .\vray.exe -server -portNumber=20204 -verboseLevel=3
                    It works! I can see the rendering progress in the VFB and in the terminal. I would like to send you these logs as well, but I can't find them. Can you point me to the right path?

                    Also, a question: I saved my .vrscene file with GPU rendering activated, but the distributed rendering started as CPU rendering. Is that correct?

                    Thanks!
                    Last edited by giulio_perosino; 13-12-2024, 02:06 AM.

                    Comment


                    • #11
                      Giulio, hello again,

                      yes, if you want to force it to use GPU, you can use this option:
                      -rtEngine=0/1/3/5/7/8/9 – Specifies which rendering engine is going to be used:
                      0 – The regular render engine is used.
                      1 – The CPU RT engine is used.
                      3 – (Deprecated) The GPU RT engine running on OpenCL is used.
                      5 – The GPU RT engine running on CUDA is used.
                      7 – The GPU RT engine running on RTX is used.
                      8 – macOS: The GPU engine running on Metal is used (can be used for hybrid rendering running on CPU and GPU).
                      9 – macOS: The GPU RT engine running on Metal RT is used.
                      The default is 0.
                      -rtTimeOut=fff – Specifies a floating-point render time value (in minutes) when using the RT engine. The default is 0.0, no time limit.
                      -rtNoise=fff – Specifies floating-point noise threshold for a frame when using the RT engine. The default is 0.001.
                      -rtSampleLevel=nnn – Specifies maximum number of paths per pixel for a frame when using the RT engine. The default is 0, no limit.

                      ​described here https://docs.chaos.com/display/VNS/V...d+Line+Options .

                      Thank you very much for all logs!!

                      I will need time to analyze them..

                      Best regards, Pavel

                      Comment

                      Working...
                      X