Wednesday, April 11, 2012

Few essentials to focus


Dr.Neil Gunther brought out clearly in a series of articles in his blog on importance of Stretch Factor(R/S) along with OS-run queue& CPU Utilization.
Infact there was surprisingly very similar attempts/approaches (there is a book written from another gentleman Dr Leonid Grinshpan on queueing theory mentioned elsewhere caught my recent attention. All this is good and reinforces the concepts clearly in my mind.

But i do find Dr Neil Gunther's blog articles very clear to the point ( Being a mathematician and done some thesis work on Probability i am usually very sharp/quick to catch anything on queuing theory with math involved along with my technical background on Telecom and Database systems)

For a performance resultant following checklist of items may be useful:

Whenever your scalability/perf test workloads clearly stretch some of the resources in your setup viz CPU and/or disk storage so that the

Stretch factor (R/S ) goes way more than acceptable SLA values it is time to stop and think a little on following lines:

i) Just a very quick check of the Hardware/Software setup to spot low hanging fruits ( This need not be too invasive and some of the things like centralized storage( NAS/filer storage if you are using) may be beyond your reach along with your server internal bus bandwidth etc. All you can do is to document what you see briefly and move on. i.e Is it a single headed NAS with a NVRAM write cache ON?, how many LUNs you had used, How were the LUNs carved out of the filesystem at filer end etc.
If you dont have clues/answers do not worry and move on. You can always understand that storage with latest trends should behave like reading of the memory with the limitation of network topology (software/hardware adaptors/HBA, NIC and mode of transport/congestion etc) to the storage. Ofcourse db workloads are little tricky as some of them have subtle dependencies as storage , more on that later.

Also do not worry if the Stretch Factor (R/S ) is too great than to account only from a single queued resource. This would only point that in addition to that resource there is another resource or network traffic congestion coming into play here. This is where you can get to understand either you can add up another stages in your model of your transaction.


ii) Having done a quick check of the CPU subsystem, Network, Storage and Memory related you have to consider carefully the modelled application workload to ensure any part of it can be safely turned off ( Ideally you would like to have not more than 2 transaction types
mixed). Tuning the unnecessary application workload parts viz bugs in software causing extra burden on resources is the most beneficial and which you can work with development. This would mean avoiding some fired sql/plsql units,network roundtrips,meta sqls in transactions would bring greater advantage.


iii) A very important side effect if CPUs in one of your servers get stretched beyond its knee ( For M/M/2 it is roughly 0.65 or 65%, M/M/4 it is 0.8 ( i need to reverify as i am writing these quickly off my head the values) is to understand if at some point the OS/kernel took off your appl or backend/db usually off the CPU for a brief period when it is operating way beyond the knee cpu resource utilization. You can spot this easily with any decently sampled OS monitor tools. For eg: i have seen linux kernels sometimes when sustained >80% of cpu usage take off the appl/db from cpu.This is usually a bug in OS kernel.
Overall you want to be on CPU always to be winning.

Saturday, March 10, 2012

Thought process skills for SPE

Dr Neil Gunther rightly mentions that hardware systems are increasingly becoming more&more commodity "black boxes" with lot of hidden complexities in each subsystem(cpu,memory,disk and network subsystems) and the onus is on the performance resultant to understand the Application atleast to get the basics if not fully for software performance&scalability. William Louth(JXInsight CTO) has also been recently stressing the same indirectly when he was presenting QoS concepts for Applications in cloud ("Applicaton is the network"). No doubt this would mean a little more steeper learning curve for anyone trying to bring about significant results in performance,scalability in shortest possible time but any such application centric efforts(be it in knowledge acquisition,understanding runtime behavior from Software performance engineering perspective)  would be more beneficial in terms of cost-benefit analysis.A glassbox if not a whitebox testing holds the key and right choice/freedom in choosing the tools/models for experienced people saves time&cost in long run to do the right/relevant work.

Thursday, March 08, 2012

Joy of sharing knowledge and being seen as ignorant

  Always it is good to share what you know for this sharing  allows you to know more and the knowledge shared grows by getting contributed by all.
 However sometimes it is best to keep quiet to let others think that they know all about what you know during discussions which can otherwise turn to unnecessary debates.
You are in fact doing good by not disturbing/agitating them by this and allowing them to go with a peaceful,happy mind.
Moreover You can also get on with your thoughts/ideas and what you wanted to do in a peaceful manner without harming each other's minds.
This is the purest form of non-violence  for i believe that any form of conflict is in essence nothing but violence.
I learnt in experience that by not  reacting in such situations does a world of good rather than trying to unsettle each other's minds.
In these days of too much emphasis on Self-managing Applications people do not realize the true potential of Self-Awareness among Humans and
the pattern recognition that humans are best at than machines.