Announcement

**super gnu** · 18-06-2015, 08:20 AM

thanks vlado, ill try that, but i dont understand what remote desktop has to do with it? i get the same issue if remote desktop isnt connected.. i just use that to see how the other machine is doing.

**vlado** · 18-06-2015, 08:21 AM

Originally posted by super gnu View Post

thanks vlado, ill try that, but i dont understand what remote desktop has to do with it? i get the same issue if remote desktop isnt connected.. i just use that to see how the other machine is doing.

It used to be that OpenCL and CUDA did not work through remote desktop (don't ask me why, I didn't write the drivers). Most drivers now support this, but your nVidia drivers are too old. Because the OpenCL implementations are chained, it may be that the nVidia OpenCL implementation doesn't want to run through remote desktop and it is preventing the Intel driver from running too.

Best regards,
Vlado

**super gnu** · 18-06-2015, 08:34 AM

ok thanks for the explanation. i still dont get how im doing rt "through" remote desktop, since it communicates directly with my main system via the network, and im just using RD to see the screen there..

however i will accept this is likely down to teminology or lack of knowledge on my part, and update my damn drivers!

another alternative would be to remove the display drivers all together, but maybe that would cause more issues.

cheers, Robin.

**chriserskine** · 18-06-2015, 05:52 PM

higher end quadros will work over standard remote desktop. I'm finding k4000 work fine, but k2200 don't.

but you can get around this by using team viewer which will let any of them work.

**super gnu** · 19-06-2015, 07:05 AM

ok so im having a crappy time trying to get opencl working. ( i know i know, use cuda... but i want to test cpu/gpu togehter as i have an underpowered gpu)

first i tried upgrading to the 353.06 nvidia driver.. still get the error:

ptxas fatal :memory allocation failure
error invalid binary (-42) at line 1514 , in file ./src/opencl_main.cpp !!!
warning:failed to compile opencl kernels,falling back to cpu mode.
error:buildprogram() failed for device 0 (-42)!
warning: initdevices() failed.

so then i tried doing the ptx optimisation environment variable as suggested

no change.. in fact maybe coincidence, but i got the same errors, plus maxed out my 32gb ram.

so i downgraded to driver 347.88

now it crashes even quicker!

this error is:

error program build failure (-11) at line 1514, in file .src/opencl_main.cpp !!!

error:buildprogram()failed for device 0 (-11)!

only thing left to try is a different build of vray.. maybe the official build is different in this regard than 3.25.01?

**savage309** · 19-06-2015, 07:06 AM

Yes, it can be different.
The only one tested to be working is 3.20 official, please test with it (353.06 should be fine).

**super gnu** · 19-06-2015, 09:57 AM

OK, so i moved to 3.20.02.. now opencl trace program completes, adn i can render on both cpu and gpu locally.

however i still have the issue on my renderslave (cpu only)

this error:

error:clcreatecontext() failed for device 0 (-6)!
warning: initdevices()failed

ive got the same vray version on there, and same intel opencl runtime.

ive uninstalled the old nvidia driver in case they were causing the issue (i now have no display drivers.. but then i have no gpu, and remote desktop still works)

ive tried with and without remote desktop connected in case that was an issue.. i get the same error every time..

**vlado** · 19-06-2015, 10:00 AM

Can you run any other OpenCL applications?

Best regards,
Vlado

**super gnu** · 19-06-2015, 10:03 AM

hm. can you suggest one thats a quick download and easy setup?

just for reference, in my initial test, on a personal benchmark, i get a time of 1m and 8 seconds with cuda, and 38 seconds in opencl using my gtx 670 and my 3930k at 4.5 ghz.. so for me at least, opencl seems the best choice, especially if i can get dr and my other cpu in the game too.

im wondering since Blagovest said that adding the cpu in cuda makes an insignificant addition to processing power, why is this different with opencl? does your cuda on cpu internal tool use avx/sse? since ive read that this is a massive benefit when doing opencl on cpu.

**vlado** · 19-06-2015, 10:25 AM

Our CUDA on CPU implementation does not use AVX; it does use SSE to some extent but probably not to the fullest extent possible. You are right that ideally an OpenCL driver may be able to do a better job out of this for the CPU, since it can use those instructions.

The nVidia GPU computing toolkit used to come with some OpenCL samples; the simpler ones (at least the device query one) used to work on other OpenCL implementations too - maybe you can try that. If you can't find it, I can probably upload that sample for you somewhere.

Best regards,
Vlado

**savage309** · 19-06-2015, 11:16 AM

Originally posted by super gnu View Post

hm. can you suggest one thats a quick download and easy setup?

just for reference, in my initial test, on a personal benchmark, i get a time of 1m and 8 seconds with cuda, and 38 seconds in opencl using my gtx 670 and my 3930k at 4.5 ghz.. so for me at least, opencl seems the best choice, especially if i can get dr and my other cpu in the game too.

im wondering since Blagovest said that adding the cpu in cuda makes an insignificant addition to processing power, why is this different with opencl? does your cuda on cpu internal tool use avx/sse? since ive read that this is a massive benefit when doing opencl on cpu.

I think the tests I've done were made using i7 2600 & gtx 690. But your results are encouraging, probably we should consider the CUDA CPU option.

For the OCL app, try ocldeviceselect.exe (comes with every V-Ray installation). It just asks the OS for OCL devices, and shows those which are suppose to run fine. You also should deselect the devices,that you don't want to compile the OCL program from the very same tool.

**super gnu** · 19-06-2015, 12:10 PM

finally cracked it.. i downloaded an nvidia opencl device query. when i tried to run it on my renderslave, i got the error missing file MSVCP110.dll.

a quick googling led me to download the visual studio redist package here:

http://www.microsoft.com/en-au/downl....aspx?id=30679

and BANG! opencl on my second cpu is working fine, and my benchmark time has gone from 38 seconds down to 28 seconds. not earth shattering, but i can see the cpu usage is all over the place, spiking to 70 - 80% and dropping into the low 30's ive tried turning the ray bundle size up to 512 and the RPP to 64, which helped a bit, but still not getting anywhere near 100% usage.

i have an infiniband connection between my two machines, which in theory should provide excellent low latency communication.. but the windows infiniband drivers are a bit funky, so its possibly causing me bottlenecks. even sending a large job with loads of textures via DR (in normal vray) i rarely get more than 5% network usage.. and thats from a 1GB/sec ssd array. i should be maxing it out.. so probably infiniband was a bad idea. :P

**super gnu** · 19-06-2015, 03:29 PM

very strange behaviour with the cpu and gpu usage. ive been playing round with RPP and RBS to try to get better utilisation.

seems no sensible combination will peg the cpus and gpus.

if i use the defaults, i get good gpu usage, and the local cpu is running between 80 and 90%

the renderslave cpu however is in the 40's to 60's percent.

if i turn up the rpp and rbs, i get gradually improving usage from the slave, and slightly decreasing on the local cpu. when i turn them up high (512 rbs 128 rpp) i get decreasing gpu usage but pretty high cpu usage on both machines, still jumping around, but normally both staying in the 80s and 90s percent.

strangely though, at certain points in the render, the cpus seem to go on holiday.. i can get 5 -10 second periods when the cpu usage drops to 5-10% and sits there. after a while they kick back in again..

also strange, if it was a network issue, surely my local cpu should be pegged?

also, my network connection (10gbps) barely makes it above 0.5% utilisation during rendering.

seems odd that its so hard to get full usage of all the devices. do the different types of device ideally need different independent rpp and rbs settings?

**vlado** · 20-06-2015, 12:45 AM

Originally posted by super gnu View Post

seems odd that its so hard to get full usage of all the devices. do the different types of device ideally need different independent rpp and rbs settings?

Yes. The CPUs and GPUs have very different performances and there is not a single set of settings that works well for both (which is why we have separate settings for these for RT CPU and RT GPU). Blago coded a dynamic load balancing scheme that seemed to work relatively well; this did not make it into any builds, but we might get back to it at some point.

Best regards,
Vlado

Announcement

amd APP for opencl on cpu?

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment

Comment