I'm in contact with the chaos group support and they are investigating the scene I sent them. I've got a question to you guys, as it seems one thing we got in common is the qnap. Do you know which kind of configuration you are running? We got a RAID5 of hdds and an RAID10 SSD cache for reads and writes. I had the idea that its maybe related to the ssd cache. Are you guys using it?
Announcement
Collapse
No announcement yet.
Backburner render, proxies disappear randomly
Collapse
X
-
Always a sigh of relief realizing you're not the only one. In return I may have a sigh of relief for you all with a possible solution.
So here's my scenario which I believe is identical to all of yours.
Long Version:
Send a Vray animation via Backburner to our 20+, Win 10 & Win 7 node render farm, check some completed frames, everything's working perfectly, head home and sleep soundly. Next day, check the frames and.......all the proxy's are gone after about 4/5 successful frames from each node, "What?". First time I've ever seen anything like this before, "Why just the proxy's, and not the textures in the same network location?".
Now this is an impressively heavy scene, massive max files, a tonne of xref's, hundreds of proxy's, so my mind goes to optimization, maybe the nodes ran low on RAM. Spend some time optimizing, get things lighter and moving quicker, send it off again, few success frames, head home, sleep less soundly. Next morning, exact same problem. "OK, lets start checking things off the list". Spend some time making sure all software versions are the same, they are. All plugins are installed correctly, they are. All network locations are accessible, they are.
"OK, now it's time to start ruling things out". Convert everything to UNC paths, no change same problem. Collapse proxy's to mesh, scene crashes way too heavy that's why they're proxy's in the first place, undo...undo...undo. Copy all proxy's locally, success everything works perfectly, this is however a work around not a solution. So with network being the possible issue I go back and check which computers had the issue and what they have in common, Windows 10!
Now this is where I leave my area of expertise and our IT Supervisor steps in. I lay out the problem to him and after "turning it off and on again?", he does some serious dark web digging and comes up with the following -
"In your GPO try to replace the action on the faulty network drive. In my case, I was using the action "Update", I changed it for "Replace" and it works." Now, I have no idea what that meant but who cares, it worked, onward with the renders!
So it seems Windows 10 rolled out an auto update that broke the mapped drives system. From this I'm assuming proxies are loaded every frame, where as textures load once and live in RAM. So after a few frames, Windows 10 decides it's had enough of mapped drives disconnecting them, and all of a sudden Vray can't see any proxy's anymore and doesn't render them.
Long post, but it was a long process.
Short Version:
Proxy's fail to render after a few frames on Windows 10 machines, do the following: In your GPO try to replace the action on the faulty network drive. In my case, I was using the action "Update", I changed it for "Replace" and it works.
Further reading here: https://social.technet.microsoft.com...in10itprosetup
Hope that helps folks.James Hall, 3d Artist, Technical Director. Based in New Zealand.
http://jameshall.nz
Comment
-
As I mentioned a little out of my league on the network end, so I'm passing on the info from our IT legend.
“We use mapped drives for our assets/outputs and that solution worked for us as per the tech net thread. Our storage back-end is a Synology NAS (all flash). Try upgrading to the latest build of windows.”
"It’s a known issue in the build we’re using, but if you update to Build 1903, it fixes it....allegedly".
Hope that helps.James Hall, 3d Artist, Technical Director. Based in New Zealand.
http://jameshall.nz
Comment
-
Thanks jhall.nz will get onto IT and give it a try.
Although as dreiddesign mentioned this would only work with mapped drives, no?
The problem has been there for at least 6 months. I have told the production team to not use proxies as a result.
We had to use them on a recent project and ran into problems again.
I ran more tests. You can see attached that I clone the text UNC.vrmesh and linked that proxy using mapped drive (T: ), unc (\\ucqnap03) path and unc (\\ucqnap03.uc.local) path
Rendered fine for 300 plus frames and on frame 335 it dropped the mesh that was mapped and the mesh that was using \\ucqnap03.uc.local but not \\ucqnap03
Am at the edge of my IT knowledge also but this makes no sense to me as I understand \\ucqnap03.uc.local and \\ucqnap03 to be exactly the same thing, right?
It seems to happen less when you use unc path over mapped drives.
Hopefully the group policy change will fix it. Will check back in if it does.
Last edited by pg1; 06-08-2019, 05:58 AM.
Comment
-
Hey guys,
some progress here. Support asked me if i could test another network storage. I therefor ran a test on a custom build nas running freenas. Got a heavy scene with lots of proxies, xrefs, xrefs with proxies in it, and so on.
Rendered 180 frames on our machines over night, and no more gone proxies!
I forgot to repath the proxies in one xref in that scene. So that particular proxies were still loaded from our qnap. And this proxies failed just like before. But the other proxies on the freenas were running smooth.
The proxies were repathed to the freenas also via UNC just like the qnap.
So it seems that the qnap seems to be the issue.
Comment
-
We have tried reproducing the issue using different machines, hardware and network configurations, but in our environment it works as expected. It looks like a network issue, but we could not say for sure why only the proxy files are missing randomly. We will forward that issue to our developers for investigation and we will notify you, when we have any news. If you were able to try the group policy fix suggested here, please let us know how it goes. We will also take a note for improving logging with V-Ray proxies, so there should be more information in the logs.
Edit: dreidesign It looks like the issue is reproducible only with Qnap nas. Will update our developers about this. Thank you the feedback.Last edited by Martin.Minev; 07-08-2019, 01:20 AM.
Comment
-
I got IT to change GPO as suggested and changed proxies to use mapped drives and got no errors last night.
The problem is intermittent so not yet convinced that this is anything other than coincidence and will continue to retest over the next 4-5 days.
I actually believe that there are 2 separate issues that result in the same outcome i.e. mapped drives were falling temporarily and proxy get lost and some other issue e.g. a sync kicks off on the nas and the proxy falls off.
The reason I say this is because from my tests mapped drives were way worse that UNC paths although UNC paths also had some issues.
I also have a job setup to source proxies on a windows based server and will see if that works, meaning the qnaps are the culprit.
Again this doesn't make a huge amount of sense to me as they are just hard drives and using a different nas is just avoiding the issue.
Do any of you guys have either syncs or virtual machines running on your Nas systems. We do and I think that when another process access the .vrmesh file it falls over.
Comment
-
Hey Martin Minev
I believe that the qnap we are using has a sync that is activated every 10 minutes and syncs the files to another qnap as part of back up and disaster recovery.
The sync itself is called qnap hybrid sync. I don't know much about it but believe it is a file level sync rather than block level.
Depending on how many files have changed in 10 minutes the sync itself may have a backlog that takes longer than 10 minutes to sync.
This will get longer as the day goes on. I believe that as vray is rendering the sync eventually gets to that file and touches it to see if it needs to copy and could be why the render falls over.
When you guys were testing did you test to see what happens to a vray proxy when another process like backup hits the file during render?
Comment
-
Guys,
We tried the possible solution put forward by jhall.nz but to no avail; last night the problem with proxies occurred again. This is an absolute nightmare, we are missing deadlines now because of this.
Hoping and praying someone at chaosgroup has managed to replicate the issue. I've no idea why this is occurring now, and not for the past few years - nothing major in our network has changed.
[EDIT] Ugghhhhh, ignore previous - we had that random, sporadic issue where proxies show up as small boxes in the viewport (because for reasons unknown vray/max won't find them) until you reload/remap them - so they didn't render correctly because of that. Attempting the test again.Last edited by Macker; 14-08-2019, 01:49 AM.Check out my (rarely updated) blog @ http://macviz.blogspot.co.uk/
www.robertslimbrick.com
Cache nothing. Brute force everything.
Comment
-
Hey Macker,
We have seen this problem for the past year and a half.
Really love proxies but were not using proxies as a result.
I have done a LOT of testing on this over the past few weeks including changing the group policy, didn't help.
There may also be a mapped drive issue that causes a similar outcome, proxies fall over, but there is definitely something else that is also a problem.
We resolved it in 2 ways.
1 moving the proxies off of the qnap onto a windows Dell server works everytime.
The other option is to en-masse copy the proxies locally to each machine. We did this using a simple python file and used deadline to execute the python file on each machine, which copies the proxies to a accessesable folder e.g. user/public/documents
Doesn't fix the issue and would love to know why qnap's seem to be an issue but at least will keep you hitting your deadlines.
Am getting IT to run firmware updates on all qnap and will continue to test the qnaps but for use we will likely replace all qnaps in the next few months.
Comment
-
Hey guys,
just a quick update here:
I installed latest firmware on our qnap (4.3.6.0895). But not success, proxies still failing. I also copied the proxies now on our freenas server, which worked in my previous test without any issues.
Hopefully the chaosgroup guys find a way together with the qnap guys.
Comment
-
Hey
If you do need to copy proxies locally this is a python script that will copy files.
To run it in deadline for example first you need to call python and then run the script.
You can do this by selecting the slave and right click - remote control - execute command - then in the dialogue box enter
"C:\Program Files\Thinkbox\Deadline9\bin\dpythonw.exe" "\\ucdlnprd01\DeadlineRepository9\scripts\UC\Manag ement\copy_testing_files.py"
The copy_testing_files.py is below. You will need to change src (source) and dst (destination) to suit your network.
import os
import shutil
src = '\\\\ucqnap01\\Asset Library\\z_deadline_testing'
dst = 'c:\\users\\public\\documents\\z_deadline_testing'
os.system( 'robocopy "' + src + '" "' + dst + '"' )
errors = False
for root, dirs, files in os.walk( src ):
src_path = os.path.join( root, src )
dst_path = src_path.replace( src, dst )
if os.path.isfile( src_path ):
if not os.path.exists( src_path ):
print("ERROR: missing:" + dst_path)
errors = True
if errors:
print("FAILED")
else:
print("Success")
Comment
Comment