The current 'Analysis Service Statistics' explain quantitatively "what is happening", but do not quantify the available capacity of the analytics engine. Managing a large and busy AF Server involves far too much guesswork as to how much additional throughput is available.
At any given moment, hundreds of analytics are running in parallel across dozens of calculation threads, using several or dozens of paralleldatapipes - but as far as I can find, there is no meaningful way to monitor the overall activity levels.
Bottlenecking occurs in many different ways - available CPU, available RAM, available threads, available data pipes, available data rates on the network... are sudden spikes associated with changes in volume of real-time data, or due to periodically scheduled analytics? Currently the official toolset is too limited to track down every eventuality. Users see the Latency escalating, and can lose a lot of time trying to understand why.
The following Performance Counters should be made available:
- Evaluation Thread Utilization (%)
(percentage of threads currently in use)
- Evaluation Thread Utilization by Scheduled Analytics (%)
(percentage of threads currently in use by scheduled analytics)
- Evaluation Thread Utilization by Event-triggered Analytics (%)
(percentage of threads currently in use by event-triggered analytics)
- Evaluation Thread Utilization by Backfilling (%)
(percentage of threads currently in use by backfilling analytics)
- ParallelDataPipe Utilization (%)
(percentage of datapipes in use)
Whether as discrete numbers or percentages, these counters are *essential* for understanding the forward capacity of AF and identifying bottlenecks in Analytic Service performance as they occur.