Friday, April 05, 2013

Retrograde Throughput factors

Trying to keep posts without real data as short as possible and so a quick update on what are the main factors that can cause retrograde throughput in load tests esp on db layer and worth checking quickly through the aid of various tools as needed for the case.
Main causes:
  • Unix/OS syscall code path
    • Recent 64bit OS versions on x86 hardware have improved/faster syscall entry into kernel mode to address this to some extent.
  • CPU Cache to Cache communication
    • This is where hardware coherency factor shows up. eg: CPU crosscalls and CPU interconnects play important role here.
  • Main memory to Cache data movement/transfer
    • Cache misses,memory stalls play important role here. cpustat,dtrace(cpc probes/PAPI metrics), linux perf hardware events to analyze can aid here. Measuring CPI(cycles per instruction) is a useful metric along with your critical code path length.
  •  Waiting on I/O DMA request
    • Waiting for I/O DMA request to complete
  • Spinning on a latch/mutex
    • Adaptive/ticketed spinlocks  for getting a mutex play important role here.
Sometimes there is no single tool/methodology would work if you want to arrive at a logical conclusion&measuring things/important metrics in analyzing issues such as why throughput falls off or becomes retrograde so badly against expectations in a given setup.

Saturday, January 05, 2013

Exploring Parallel Processing with Oracle-part 1

 There are a lot of web content on parallel processing in general out there and i just wanted to post some of the useful links/references related to Oracle (i am not  touching Exadata or other Oracle fusion platform related technologies here) database/plsql in general as an introduction/quick reference.
The main motivating factor would of course be dependent on the use case you have in hand and a correct understanding of your existing codebase/application would help in terms of where you stand with respect to your performance/scalability requirements.
  • There are 2 basic options to parallel processing using Oracle in particular with PL/SQL.
    • You can use the Parallel Query (PQ) feature of Oracle. This parallelises SQLs by breaking large scans into a number of smaller scans and running these in parallel. You can also run PL/SQL via PQ by defining a parallel enabled PL/SQL pipeline table function.  
    • This second method is rolling your own parallel processing in PL/SQL. With 11g you can use DBMS_PARALLEL_EXECUTE. You can use DBMS_JOB to run parallel processes. You can use message queues and database pipes (or even plain SQL tables) for IPC. You can use DBMS_LOCK for implementing semaphores and mutexes.

     
  • Also factor in your main motivating need to do parallel processing: i.e Speed-up or ScaleUp .i.e 
    • SpeedUp means : If your current single threaded process/code takes T(1) time and you see this unacceptable and want to reduce it i.e speed up with "p" processes/threads then your new improved time would be T(p). In general T(1)/T(p)=S(p)  has an upper bound called Amdahl bound which is 1/(sigma) as p increases to higher values. This is called diminishing returns for more/higher values of p. Here this "sigma" factor is the serial factor/portion in your current workload T(1).                                  i.e T(1) = (1-sigma)*T(1) + (sigma)*T(1)  and T(1)/T(p) = S(p) = p/(1 + (p-1)*sigma) . Hence as p -> infinity the speedup factor curve with p additional threads would hit a aymptotic limit of 1/(sigma). Note: Here the sigma portion represents that serial portion in your current workload which cannot be parallelized at anytime eg: a setup and tear down steps which can run only in one thread. 
    • ScaleUp means: Here you have or foresee some scalability issues in scaling to more workload with your current single threaded code/process. If you have a workload input parameter say N with your throughput for this input as T(N). Now you want to scale to higher workloads of this input workload parameter N as linear as possible i.e closer to linear scalability i.e The chart/graph of T(N) vs N is close to linear as possible. Though linear scalability of throughput T(N) vs N is ideal and you cannot achieve you would want to stay close to it. Contention and coherency are 2 factors that affect scalability . In real world your throughput would never increase forever in a given hardware setup and you would be interested to see your boundary limit beyond which throughput may fall off i.e become retrograde. This is something you can advise your customers i.e on a given hardware you can go upto this N(max) value.
    • Sometimes you want both Speedup as well as Scaleup i.e you want the batch completion time to stay as close to same as T(1) even with increasing workloads. This may or maynot be practical always and it depends.

General  Parallel Processing Patterns http://parlab.eecs.berkeley.edu/wiki/patterns/patterns  
 Above link is a very useful link to have a look at all kinds of parallel processing patterns at one shot/glance for interested folks.
1 ) Data Parallelism  In Oracle:
  • It is always advisable to read /understand that SQL level parallelism via  PDML/PDDL can be used wherever possible in Oracle EE db. i.e enterprise versions whenever your use cases demand. Along with this query level parallelism one would also be using some sort of parallel procedural processing (if you plan for multiple threads of procedural processing) to break your main job/process into sub jobs/tasks. 
Note on How Oracle implements dbms_parallel_execute: In Oracle database from 11gR2 onwards the parallel framework provided by dbms_parallel_execute package  used to submit/implement the parallel sub-tasks/jobs  via dbms_scheduler job processes governed by job_queue_processes init.ora parameter(prior to 11g one may also have to write your own parallel framework (DoItYourselfParallelism as Tom refers often and then also submit/implement the tasks via dbms_jobs package.(  dbms_jobs  is very similar to dbms_scheduler though dbms_scheduler is more sophisticated and better integration with RAC,node-affinity etc) . Loosely you can refer to these launched slave processes as different threads  processing these sub tasks (but behind the scenes they may be separate OS processes but this implementation is transparent to you and it is upto Oracle to use process or threads based on the particular port of OS. Except on windows on most unix like OS ports,Oracle may use separate OS shadow processes only for dbms scheduler jobs spawned by the subtasks . Again this is immaterial to you/transparent as Oracle would give you control over sub-tasks status,start/stop mechanisms along with knobs to adjust the degree of processing generally needed by you.

Things/Factors to consider in "data parallelism" pattern are following things:
  • Initial Setup:What is the setup needed for breaking down the data to be processed into buckets to suit your needs. The breakup may need some understanding of data so that the buckets/sub groups are balanced. May need fine-tuning or further fine grained functional breakup depending on better failure control/balance needs.
  • What is the atomic unit of work in any of your typical iteration/task. i.e It could end up being a INSERT/UPDATE i.e a DML or some simple numerical calculation steps in PLSQL( since we have already assumed data-parallelism pattern these numerical calculations would not be compute intensive or some complex mathematical ones but rather some simple ones ). 
  • Balance of PQ and parallel processing How to balance or rather mix cleverly Parallel DML/DDL also in your processing . One thing to note here is esp in RAC/cluster database setups you need to watch for inter-node overheads. i.e to limit movement of data over RAC interconnect so that data is as local as possible.(I think dbms_scheduler used by dbms_parallel_execute here would use RAC service names and you have better control over which nodes participate in your parallel processing etc. One has to explore this further and test it)
  • Knobs/End-User interface:  Some useful terminologies and patterns for reference
    • Oracle Fusion Applications have a UI Design Pattern for this here.