Announcement

**Mousa_SA** · 12-08-2017, 01:23 AM

+1 for this

**Peter.Chaushev** · 14-08-2017, 12:42 PM

1. If the Goal slider is set to 10% it will use that many CPU/GPUs out of the total number. To use all available resources simply set it to 100%. Otherwise you can set a limit to your usage so other render jobs can occupy the rest of the machines which are standing by.

2. Currently a render node machine can only participate in a single render job. If you open the Network page of the V-Ray Swarm manager, you can see the IPs of the machines which issue each job. ( https://docs.chaosgroup.com/display/...orking-Network ).
In your case, you've initiated your render process before your coworkers and have occupied all available machines. You can interactively change the slider and this will take immediate effect - set it back to, let's say 30%, and your coworkers will be able to use the other 70%. You don't need to stop your render process to do this.

3. The Resource usage bar indicates in red how many of the total available render node machines are currently occupied and thus unavailable. You and your coworker can each decrease your goal % without stopping your render job and your other coworker will be able to automatically hook up idle machines as long as they have enabled Swarm and set the goal above 0%.

To sum it up:
- Render node machines are managed interactively, no need to restart your render process when you make changes.
- Render node machines can do one job at a time.
- The Resource usage bar displays interactively in red and blue the current usage (hover with your cursor for detailed info).

I've asked one of our V-Ray Swarm developers to take a look at your suggestion.
Let us know if you have any other feedback or questions!

Kind regards,
Peter

**delineator** · 14-08-2017, 01:17 PM

I appreciate the response, but I don't think that actually addresses my complaint. I know and understand we can interactively change the percentage, however, that doesn't take away from the fact that if I start my rendering at 100%, and then my co-worker starts it, he gets 0 resources. Zero.

It also doesn't take away from the fact that is seems a bit non-sensical to to set the slider to 10% and only use 10% of the resources, even though no other machines are rendering at the time. This means that I simply can not leave my slider at 50% and "set it and forget it" and trust in swarm to auto juggle. Instead, it means that I have to set mine to 100%, then be clued in by my co-worker that he is rendering, and set my slider to a lower value.

The whole design just doesn't make a ton of sense. If the slider remains, then some tweaks need to be put in so that it auto juggles based on how many jobs are requesting resources at that time. Although I think I prefer the 'normal vs high priority' switch - much simpler, much easier to understand, set it and forget it.

**vlado** · 15-08-2017, 02:06 AM

Right now, once you take a machine for rendering, it will be occupied with that rendering until it completes. I guess you are asking about dynamic load balancing where some of the machines can cancel your job to join your coworker's render. We have not implemented that yet.

Best regards,
Vlado

**delineator** · 15-08-2017, 02:51 AM

Originally posted by vlado View Post

Right now, once you take a machine for rendering, it will be occupied with that rendering until it completes. I guess you are asking about dynamic load balancing where some of the machines can cancel your job to join your coworker's render. We have not implemented that yet.

Best regards,
Vlado

I'm confused. Are you saying that a SWARM node may start a job, but won't jump over to my co-workers when he starts (assuming we both set our sliders to 50%)? Because that is not my experience with SWARM. They in fact do stop or drop jobs depending on your position of the slider

Two examples which I can provide documentation of if needed (note: this is using the latest version of standalone swarm and buckets):

-I start a job at 100% and get all the buckets. I want to render something else locally on a machine so I remove that tag from swarm and that particular computer stops rendering on mine
-I start a job at 100% and get all the buckets. My co-worker also wants to render, so I move my slider down to 50% and it drops roughly 50% of the buckets and my co-worker gets those

All that being said, I'm not sure we are understanding the issue. I understand that there may be reasons as to why it was designed that way, but in production environment, it behaves more like a bug, not a feature.

**psstoev** · 15-08-2017, 06:45 AM

Hello,

Thank you for your feedback.

Making better use of resources is something we are aware of and want to improve, but chose to err on the side of simplicity for many reasons, but mostly time constraints and the desire to get it right.

I'll try to clarify the slider's behavior and why we chose to implement it the way it is:

1. At the moment Swarm doesn't support automatic resource management. Instead, resources (machines) are assigned on a first-come, first-served basis. The only changes to machine usage come from manual adjustment of the sliders or adding/removing tags in the UI.

The meaning of the percentages on the slider is "Try to use no more than X% from the available resources". That's why when you set a goal of 10% it uses no more than 10% of the CPUs/GPUs in the Swarm network, regardless of the availability of more free machines.

Also, if the network only has 20% free CPUs available, and you want 40%, you'll only get the 20% and won't receive additional resources until someone frees some machines.

2. Swarm does not try to solve the problem of equal/prioritized distribution of resources to all users in the network. The problem that Swarm currently solves is the discovery of machines for DR in the local network. If, for example, you had to put a list of machine IPs for DR and claim all machines, your colleague will still be unable to use any machines unless enough instances of V-Ray Standalone are killed to free up machines for new jobs. At the moment Swarm is machine-oriented, not job-oriented.

3. Automatic resource management gotchas (or why we didn't do it in 1.x):

Equal distribution of cores for all users will sooner or later lead to weird edge cases.

For example, you have a total of 2 machines in the Swarm network, you start rendering with both of them because they are free. A colleague comes in and tries to use DR, so the Swarm network stops your DR session on one of the machines and gives it to the colleague; now each of you uses 50% of the resources. When a third colleague tries to join he would not receive any machines, because there is at most one DR job per machine and both machines are busy. The example scales up, for example you have 9+ users and 8 machines.

In those cases we will need some form of a "super-user" who can manually drop clients from the network. This still requires manual intervention and communication between users.

If there is a "High priority" checkbox in the UI the result will probably be that everyone will try to use it and it may end up annoying everyone on the network. The control of that feature will also be manual ("controlled by admins via the swarm web ui").

In most situations the only solution that Swarm can provide is to have some UI that allows manual resolution over disputes for resources.

This is not exactly "set it and forget it" most of the time.

We are thinking about how to improve the current workflow, that is, to allow optimal usage of resources while keeping the UI simple and consistent. That will take some planning to get it right, so bear with us

Your feedback is really appreciated, so thanks again!

We'd be happy to take these ideas further, so any suggestions are welcome.

Best regards,
Plamen Stoev,
Software developer

Announcement

Swarm goal slider

Swarm goal slider

Comment

Comment

Comment

Comment

Comment

Comment