Announcement

Collapse
No announcement yet.

Problem with render nodes - maybe some one knows...

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • Problem with render nodes - maybe some one knows...

    Hi,

    So I've build a small farm for my small studio space (4 1u servers with twin setup) so 8 xeon nodes in total. It was all working spiffy, then I had a long quiet period so I shut the farm off. Come back 6 month later, when I powered it on it works and renders as before, but now some machines bsod with message - Uncorrectable error in DIMM 1 (2) or could by any dimm, then in windows it says that machine experienced unexpected shutdown due to kenell power.

    It happens at random, on random machines. Whats odd is that absolutely nothing has changed between then and now. And they can be rendering for 24 hours fine, then something happens.

    My first thought was to reseat the ram...but perhaps it could be something else? maybe there are some real bugs living there )
    Dmitry Vinnik
    Silhouette Images Inc.
    ShowReel:
    https://www.youtube.com/watch?v=qxSJlvSwAhA
    https://www.linkedin.com/in/dmitry-v...-identity-name

  • #2
    This is a tough one to track down. Here are some thoughts...

    - When the machines BSOD were they rendering higher memory scenes?
    - Run your favorite memory test multiple times to verify all memory passes...(memtest is my go to)
    - Most of my HDD failures have come from machines that have sat idle for long periods of time. Maybe a bad HDD? Run a chkdsk to see if there are any bad sectors on the HDD.

    That's just a few items to get you started. Anything related to Kernel is typically RAM or HDD related, but not always. The last time I got these kinds of errors, I just wiped the machine and reinstalled fresh and all problems went away. (I didn't have too much time to troubleshoot the problems).
    Troy Buckley | Technical Art Director
    Midwest Studios

    Comment


    • #3
      Originally posted by Donald2B View Post
      This is a tough one to track down. Here are some thoughts...

      - When the machines BSOD were they rendering higher memory scenes?
      - Run your favorite memory test multiple times to verify all memory passes...(memtest is my go to)
      - Most of my HDD failures have come from machines that have sat idle for long periods of time. Maybe a bad HDD? Run a chkdsk to see if there are any bad sectors on the HDD.

      That's just a few items to get you started. Anything related to Kernel is typically RAM or HDD related, but not always. The last time I got these kinds of errors, I just wiped the machine and reinstalled fresh and all problems went away. (I didn't have too much time to troubleshoot the problems).
      Thanks! Nether do I. It seems its same set of machines, for example I have render1-8, always render 3 and 6,7 exibit this issue, but sometimes 2. However 3 is the worst.

      I'll check em out for sure. I'm gonna pull one out and open it up too.
      Dmitry Vinnik
      Silhouette Images Inc.
      ShowReel:
      https://www.youtube.com/watch?v=qxSJlvSwAhA
      https://www.linkedin.com/in/dmitry-v...-identity-name

      Comment


      • #4
        While you have them out, I would definitely reseat the RAM. It doesn't hurt anything, just to rule that out.
        Troy Buckley | Technical Art Director
        Midwest Studios

        Comment

        Working...
        X