Announcement

Collapse
No announcement yet.

Is there a way to specify an automatic retry when there are no licenses available?

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Is there a way to specify an automatic retry when there are no licenses available?

    Is there a way to specify an automatic retry when there are no licenses available? What I'm looking for is a retry in the licensing code for V-Ray that, if it fails to obtain a license, will pause for a predetermined amount of time then try again. What I'd ideally like is a command-line flag that allows you to specify the number of retries and the wait time between retries,

    Why am I asking for this when one could simply launch the job again if there are no licenses available? Between the time that we check if there are free licenses available and the time when the job starts, another render job could start and grab the license. Since we have hundreds of V-Ray licenses, its kind of a whack-a-mole situation. So when we check the available license count we actually pretend we have slightly fewer than actually exist. As a result we're not using our full complement of licenses. The alternative is to let the render job fail, but that tends to happen at night on the farm, and since we won't check the status until morning, we might find a few jobs never started because there were no licenses, which requires the job to be resubmitted and the artist has to wait.

    So what we are interested in is something along the line of a command-line flag that allows you to specify the number of retries and the wait time between retries, e.g. "-license_retries 5 60". In this example, if a license were unavailable it would wait 60 second and retry again, and repeat up to 5 times until it either grabs a license or else fails on the 5th time. But maybe there is a callback that can do this? Another alternative is to create a wrapper, but we would need to somehow detect that the job failed due to licensing and not some other reason

    Is there any mechanism to do anything like this already from the V-Ray command line?
    Thanks,
    - vanpixelguy
    Last edited by vanpixelguy; 17-08-2020, 06:18 PM.

  • #2
    There isn't such a command. I did not understand how exactly is the lack of licenses happening? Are you sending more jobs to the farm than there are licenses, so they have to wait for each other?
    Aleksandar Hadzhiev | chaos.com
    Chaos Support Representative | contact us

    Comment


    • #3
      Hi Aleksandar,
      We had 8 jobs fail last night. We are not submitting more jobs than we have licenses: we set the limit in our render farm software to 73 less than our available V-Ray license count, and I am also tracking the number of licenses reported on the chaosgroup Online server by the minute and at the time of two of the example failures I'll cite below, we had around 429 licenses in use on the server. We have, i believe, a total of 493 available.

      Here are two example cases from last night. Both failed to obtain a license due to a "slow connection to the ChaosGroup license server".

      5858544_10.out:[2020/Aug/19|02:39:29] [189 MB] error: Could not obtain a license (1001): Slow connection to the ChaosGroup License Server.

      5863106_6.out:[2020/Aug/19|02:38:39] [115 MB] error: Could not obtain a license (1001): Slow connection to the ChaosGroup License Server.

      I don't know about the other six cases of failures last night because the artists will retry any failed jobs when they come in in the morning. I also track the license count on the server via polling and here are the results from the log file. The first column is the sample time, the second is the Online count, the third is the render-farm resource count, and the fourth is the render farm resource max which we clamped to 430.

      2020-08-19 02:38:12.701000, 428, 430, 430
      2020-08-19 02:39:13.136000, 427, 430, 430
      2020-08-19 02:40:13.614000, 429, 429, 430

      As you can see, according to my polling of the Online server at one minute intervals, our license usage was no more than 429 licenses (unless there was a spike between samples).

      Based on the error message "... slow connection to the ChaosGroup license server" the problem appears either that the server is down, or overloaded, or your internet connection is not responding, or our internet connection is not responding. I checked your status portal and it shows online licensing has been at 100% for the past week (https://status.chaos.com/). I checked with our IT department and we did not have any downtime alerts on our end last night.

      Anyway, IF the problem is the server being overloaded on your end, I would either like to have a mechanism to retry on license failure, or bring the server in-house.
      Thanks.
      Last edited by vanpixelguy; 19-08-2020, 01:37 PM.

      Comment


      • #4
        Hi Aleksandar,
        One more thing, we're on the West Coast of North America (Los Angeles time zone), so if you are checking server logs on your side, the times I show above, 02:39 would be Pacific Daylight-Savings time (aka PDT). That corresponds to 12:39 Eastern European Summer time (EEST). We had 6 other failures last night and I don't know the times because the logs got overwritten when the artists retried their jobs.
        Thanks,
        - Bill

        Comment


        • #5
          It was suggested that we upgrade to v5.5.0 or later of the chaosgroup license server software and then "borrow" all our licenses so they are held locally. We are trying this out and will update this forum post with our findings.

          Comment


          • #6
            Just to update this post, we have so far (1) upgraded to v5.5.0 of the license server software, (2) borrowed virtually all our licenses to they are "local", (3) raised the resource counter in our farm management software to allow almost all the V-Ray licenses we are paying for to be used. The net result has been:
            1. The "Could not obtain a license (1001). Slow connection to the ChaosGroup License Server" message seems to be totally eliminated. Its been a week since we did the borrowing, and there have been no reports of this error.
            2. We are obtaining many failures due to "Could not obtain a license (-9 There are no available licenses of this type on the ChaosGroup License Server.". This is despite having our farm resource set to 13 licenses below the number of paid licenses. Our farm software should not be launching jobs on more than (NumLicenses minus 13) machines simultaneously, and thus we should not be exhausting our license count.
            Its puzzling why we are running out of licenses. Our thought is there is a time requirement to check in and check out a license and this exceeds the turnaround time with the actual V-Ray job to shut down and start up.

            Comment


            • #7
              For an update on this issue, it was determined that we had a secondary license server running. We disabled it and we dropped from having hundreds of license failures per night down to a handful. Here's a summary of what we changed:
              1. Switched off the secondary license server (which was possible ill-configured, not sure).
              2. Upgraded to v5.5.0
              3. Borrowed all licenses.
              4. Raised our resource counter to match the number of licenses we actually have.
              Borrowing seems to have eliminated the "slow connection" errors. Switching off that secondary server pretty well eliminated all the "No licenses of that type available" error. Now were not seeing any failures on quiet nights, and at most 8 failures on very busy nights where we are using all of our licenses. So this is working well now.

              Comment

              Working...
              X