Announcement

Collapse
No announcement yet.

VFB History and Hard Disk Behavior

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #16
    Cool! (no pun intended )

    Overheating is definitely something to check for. I think I mentioned it somewhere in the many things I said earlier lol some harddisks can actually generate quite a lot of heat. Especially if they are on top of eachother in the case with little to no space between them. This could also cause them to "protect themselves" and shut down on their built-in thermal protection. So maybe, yes, could be something most definitely!

    Do you have (software-)tools to check & monitor temperature of your system components?

    Comment


    • #17

      I dont know any software too to check this.
      Which one should I install?
      for my blog and tutorials:
      www.alfasmyrna.com

      Comment


      • #18
        I really like this one: http://www.almico.com/speedfan.php

        It allows you to not only monitor temps but also change fan speeds, monitor temps over time using graphs and you can also check the SMART statistics of your harddisks, which might reveal some interesting info in your case (with regard to errors).

        Comment


        • #19
          Hi John,
          Finally I am making the memory tests through the boot cd. The test started but I didn't understand anything.As far as I understand it made one pass and found no errors and making the 2nd pass now.
          Does it make one pass for every memory?
          Will it display me any report when test ends?

          By the way I installed the fanspeed program too but dont understand that either.
          Will it display me any message when there is a problem?
          for my blog and tutorials:
          www.alfasmyrna.com

          Comment


          • #20
            Yeah, I'm sorry. I probably should have explained a bit about what it does. Especially the memtest program is quite 'technical'.

            What it does is execute infinitely. It never stops going through the test phases. Each pass tests all memory modules by using several different test types. It basically writes and reads data sequentially until it reaches the end of writable memory.

            Typically if the first pass is done and no errors are displayed in the screen (they are indicated in red below the ERRORS column) it's safe to assume the memory has no errors. But to be on the safe side it is best to let it run at least three passes. This way the memory gets a lot of load & unload stress which kind of resembles a heavy render load. If there are no errors after three passes you are a very lucky person and you can safely say the memory is top notch I usually get an error or 2 but that doesn't mean the memory is faulty. This is actually quite normal. This is why several passes are best. Errors detected in a specific pass might disappear in another pass, which in that case is a good thing. No need to go into the technical background of why this happens. Just accept it for what it is

            So please let it run 3 passes at the very least and if all is well you can remove the CD and restart the computer without having to worry about the memory anymore


            The Speedfan program is a program that merely monitors temperature. If you take a look at the tab READINGS you see an overview of your systems parts and from there you can configure normal and warning temperature for each device (click the CONFIGURE button). The flame icon indicates a device which has gone above the set threshold. I have mine set up lower than usual to show an example of this:

            Click image for larger version

Name:	001 - Readings.jpg
Views:	1
Size:	71.3 KB
ID:	844109

            The other tab of interest is the CHARTS tab. Simply enable the checkboxes of the devices you want to monitor and the graph will start to run. As you can see in the example below the green line goes pretty high. It's my main graphics card (GTX285) driving two monitors. The red line is my second card driving a single (third) monitor and obviously has much less to deal with. The grey line is the CPU, the blue line a harddisk. All are behaving pretty well

            Click image for larger version

Name:	001 - Charts.jpg
Views:	1
Size:	74.8 KB
ID:	844110

            You will have to check for yourself if the values are within the safe margin. The application doesn't know what temp is safe for a device.

            For example you have a Intel Q9550 processor. The maximum temperature under which this processor can work optimally is about 60 to 65 Degrees Celsius. So if the temperature breaks that threshold you should be worried and start checking if there is something wrong with the cooling paste (between the processor and cooling fan), the cooling fan itself (dust buildup) or if the airflow in your computer case is inadequate. Many times people add extra fans thinking that's a good thing because there would be more airflow. But what they fail to realize is that by doing so they break the 'natural' airflow of the design of the case actually causing tremendous heat buildup. Obviously this can seriously damage the delicate hardware inside.

            Now, most processors and mainboards these days have thermal detectors built in which causes the system to automatically shut down before any damage occurs. So basically you don't need to worry too much about heat damaging the main components. However, if you fail to recognize and ignore these signs then damage is sure to happen.

            That's why it's always a good thing to monitor temps. Especially if your really giving your computer a run for its money

            Now, harddisks is a totally different story! They have large mechanically moving parts that are severely impacted by heat. There is no hard data of what a maximum temp for a HD should be because of the different mechanical properties of all disks. Most manufacturers I believe recommend no higher than 55C. But one thing is for sure, each 5-8C you can lower the temp on the disk doubles its MTBF! Meaning, it will last WAY longer if you work on lowering the temperature. So spending a couple of euros/dollars on diskfans is money well spent. And because of their relatively small size these fans do not really impact the airflow within the case either, so no worries there.

            Graphics cards is again a tough subject because each card behaves differently. Even cards of the same brand and type tend to have different fans installed. But basically go with what the manufacturer says. In your case, with the 8800GT, I'd say safe max temp is about 80C. I had one too and I overclocked it and equipped it with an aftermarket cooler and it ran at 90-100C continuously for a year. So it can take quite a beating great card btw!! I loved it!


            Just remember, the cooler, the better


            Anyway, I'm sorry for the LONG story but I hope it helps you understand some of what's going on

            Comment


            • #21
              Hi John =)
              Thanks for this detailed information on a Saturday night!!!

              I ran the memory tests about 4-5 hours. I think I made 3.5 passes or 2.5 passes. Below it was written 3 ( I suppose this is the completed passes. And the current running pass was around 80% and I canceled it)
              There were no errors at any pass
              By the way I learned that my memory modules are not kingston but mushkin.

              Now for the last 3-4 days, after unpluging the 2 hard drives, pc is running fine, without any errors.
              And now we know RAm are fine.
              I am planning to plug the hard disks one by one to see if there are problems re-occuring. What do you thinku, do you recommend the same?

              If problems re-happen, then this means that hard disk is corrupted or the hard-disk controller or the fan-speed.
              We will be able to monitor fan-speeds and see if that is the cause.
              And if not, how will I know if it is the hard disk controller or the hard disk itself ?
              In the meantime there are only 2 hard disks : One has 2 partitions C and D. And another hard disk named S. I rarely use S because it is like a storage for archive.So if hard disk controller is broken, maybe it doesn't show because I rarely use S. However it has the Downloads folder on it, which gets active frequently during the day. So maybe hard disk controller gets active. Honestly I don't know how to judge this.

              Anyway thanks a lot again for this great help, feedback =)
              for my blog and tutorials:
              www.alfasmyrna.com

              Comment


              • #22
                Originally posted by pixela View Post
                Hi John =)
                Thanks for this detailed information on a Saturday night!!!
                lol! No problem, my pleasure!

                Originally posted by pixela View Post
                I ran the memory tests about 4-5 hours. I think I made 3.5 passes or 2.5 passes. Below it was written 3 ( I suppose this is the completed passes. And the current running pass was around 80% and I canceled it)
                There were no errors at any pass
                By the way I learned that my memory modules are not kingston but mushkin.
                That's great news!! And Mushkin indeed IS top notch memory! Excellent! We can discard the memory modules as the source of the issues.

                Originally posted by pixela View Post
                Now for the last 3-4 days, after unpluging the 2 hard drives, pc is running fine, without any errors.
                And now we know RAm are fine.
                I am planning to plug the hard disks one by one to see if there are problems re-occuring. What do you thinku, do you recommend the same?
                Sounds like an excellent plan! I would do the same

                Originally posted by pixela View Post
                If problems re-happen, then this means that hard disk is corrupted or the hard-disk controller or the fan-speed.
                We will be able to monitor fan-speeds and see if that is the cause.
                And if not, how will I know if it is the hard disk controller or the hard disk itself ?
                Alright, basically it means there are still a couple of common problem areas to research which are the following:

                1. Harddisk(s)
                2. Motherboard (disk controllers primarily)
                3. Power Supply
                4. Operating System

                Number 1 is in progress by reattaching the disconnected disks one at a time and checking to see if the problems reoccur. If you do get a problem switch the disk to another connection on the motherboard, if possible preferably one not directly next to the one it is currently in, and see if the problem persists.

                Number 2 is quite hard to check by yourself unfortunately. The only way to do this is by having an expert check the components of the board and measuring them with specialized equipment. There are no software tools that will allow you to specifically test the disk controllers (or any other component of the motherboard for that matter).

                Number 3 (Power Supply) could also still be an issue. But you could test this when you have finished test number 1. If you find the system has problems again with a certain disk reattached, you could try to disconnect other hardware components. Such as the DVD player and another harddisk (if that's possible, maybe the S drive?). Then see if the problem still persists.
                If it does then the disk(s) is (are) most likely causing the issues. You could then try to format it (do a full format, not a quick format!) and see if that solves the problem.
                If no problem occurs then we have 90% certainty that the disk is NOT the culprit. Instead things are then starting to look like a power supply issue.

                Number 4 (Operating System) is another obvious candidate for investigation if none of the above give us any clues on what's going on. A fresh install would be the only option to check this. Corrupt/incorrect drivers or even ones installed from Windows Update are notorious for causing all kinds of issues. I never ever install drivers from Windows Update. And the reason for that is that they are usually fairly generic drivers, not specifically aimed at the brand and model of the hardware component in your system.


                Originally posted by pixela View Post
                In the meantime there are only 2 hard disks : One has 2 partitions C and D. And another hard disk named S. I rarely use S because it is like a storage for archive.So if hard disk controller is broken, maybe it doesn't show because I rarely use S. However it has the Downloads folder on it, which gets active frequently during the day. So maybe hard disk controller gets active. Honestly I don't know how to judge this.
                If the S drive isn't used much then it could be sitting in a broken disk controller port. But that's hard to say. It's an intermittent problem which makes it hard to troubleshoot. So the only way to really know if the controller / harddisk is broken is by making a lot of use of it. Have you been using the S drive these last few days while the other disks were disconnected? If you haven't then it might be a good idea to also get some activity going on that drive. Just to see if something goes wrong when you're using it.


                Originally posted by pixela View Post
                Anyway thanks a lot again for this great help, feedback =)
                You're very welcome!


                p.s. did you ever get a reply on the error message you sent to Vlado? The one from your third post in this thread.

                Comment


                • #23
                  Hi John,
                  Thanks again soooooo much for your reply.
                  I s very clear and detailed information about what I should do and I will try all of these one by one and will let you know about the results.
                  Vlado told me that it could be a hard disk error and his hard disks were behaving strange 1-2 weeks before they die.
                  Cheers, Happy Sunday =)
                  for my blog and tutorials:
                  www.alfasmyrna.com

                  Comment

                  Working...
                  X