stupid dell raid card..
By Jesse Morgan
so the devs are having a hell of a time with the new dev server- it’s constantly locking up on certain threads for 30-45 seconds. apache mainly, but occasionally grep and other things. It appears to be completely random. I’ve been pulling my hair out trying to find the problem, and I think I got it nailed down thanks to a post on the gentoo forums (http://forums.gentoo.org/viewtopic-t-189180-highlight-writethru.html) that said to change some settings in the raid card’s firmware… I tested this on the prod machine since it’s not in use yet(they’re both dell 2850’s with the lsi megaraid cards) and benchmarked before and after with bonnie++… here are the results(- is original, + is new):
Version 1.93c ------Sequential Output------ --Sequential Input- --Random-
Concurrency 1 -Per Chr- --Block-- -Rewrite- -Per Chr- --Block-- --Seeks--
Machine Size K/sec %CP K/sec %CP K/sec %CP K/sec %CP K/sec %CP /sec %CP
-prod 7G 331 98 27139 10 17554 4 1156 99 65858 7 371.8 5
-Latency 104ms 41538ms 804ms 19824us 46554us 631ms
+prod 7G 335 98 54977 23 27923 7 1075 99 61596 6 342.9 4
+Latency 76755us 331ms 590ms 12104us 24805us 414ms
Notice the second column under Latency? 41538ms vs 331ms – a 40 second latency when using the dell default settings…
I think this is where my lag issue is coming from… here’s hoping.
I’ll report back when I make the change on dell and see if the devs are still having lag problems.
Update-
so I switched the write policy to ‘write-thru’ instead of write back and changed the readpolicy to normal rather than adaptive… changed the benchbark, but not the problematic behavior- well, actually now, ALL of the devs are locking up at the same time…