Acquiring high-resolution time stamps
Brief Summary
This topic covers day010. We will introduce some basic method of measuring our program performance.
This will cover the QueryPerformanceCounter function and the RDTSC instructions.
We will then get a brief summary of the low level hardware clock characteristics.
How the Computer Get the time stamps?
All x86 processors since the Pentium has a 64-bit registrer Time Stamp Counter(TSC).It counts the number of cycles since reset. We can use the instruction RDTSC to get the TSC’s value. This instruction will read the current value into EDX:EAX registers. The EDX register is loaded with high-order of 32 bits, and EAX is the lower 32 bits.
You can use __rdtsc function in windows to get the TSC value. It will return a 64 bits unsigned value.
The RDTSC instruction is not a serializing instruction, that is to say, it does not necessarily wait until all pervious instructions have been executed before reading the counter. Similarly, subsequent instructions may begin execution before the read operation is performed.SO this may introduce some flaws.
Also, relying on TSC also reduces portability, as other processors may not have a similar feature. So that’s why Microsoft strongly discourages using the RDTSC or RDTSCP.
In POSIX systems, you can read the value of CLOCK_MONOTONIC clock using the clock_gettime function. And in Windows, use the QueryPerformanceCounter adn QueryPerformanceFrequency.
Linux API
Linux provides a function called clock_gettime(2) (the (2) indicates that this is System Calls, You can also find clock_gettime(3) in the mannul, for which , it is C Library Functions).
The clock_gettime() system calls allow the clling process to retrieve the value used by clock_id. The clock_id can be obtained from clock_getcpuclockid(3) or pthread_getcpuclockid(3).
For example, you can get **CLOCK_MONOTONIC time. Refer to this to get a complete example.
Windows API
The primary API for native code, as we will use, is QueryPerformanceCounter(QPC). You can also use kernel-mode API KeQueryPerformanceCounter or System.Diagnostics.Stopwatch class .
QPC is independent of and isn’t synchronized to any external time reference.
That is to say, QPC doesn’t rely on any external time reference (UTC, for example). So the deviation of measuring mainly comes from your own computer.
Typically, QPC is the best method to use to time-stamp events and measure small time intervals that occur on the same system or virtual machine.
Using QueryPerformanceCounter(QPC)
When you need the time stampls with a resolution of 1 microsecond or better and you don’t need the time stamps to be synchronized to an external time reference, you can use the QueryPerformanceCounter(KeQueryPerformanceCounter or KeQueryInterruptTimePrecise for kernal use).
QPC basically, use something similiar to __rdtsc. However, when it find that there is no TSC, it can solve this correctly. For example, when you use the performance ounter on large server systems with multiple-clock domains. And QPC is independently of any external time reference.It will not affecte by time zones. system time changes and all these sort of things. The QPC will even not be affected by processor frequency changes (for example, you have a power management) .
Example of Using QPC in Native Code
LARGE_INTEGER StartingTime, EndingTime, ElapsedMicroseconds;
LARGE_INTEGER Frequency;
QueryPerformanceFrequency(&Frequency);
QueryPerformanceCounter(&StartingTime);
// Activity to be timed
QueryPerformanceCounter(&EndingTime);
ElapsedMicroseconds.QuadPart = EndingTime.QuadPart - StartingTime.QuadPart;
//
// We now have the elapsed number of ticks, along with the
// number of ticks-per-second. We use these values
// to convert to the number of elapsed microseconds.
// To guard against loss-of-precision, we convert
// to microseconds *before* dividing by ticks-per-second.
//
ElapsedMicroseconds.QuadPart *= 1000000;
ElapsedMicroseconds.QuadPart /= Frequency.QuadPart;The QPC returns a LARGE_INTEGER, it is a union.
If you complier has built-in support for 64-bit integers, use the QuadPart member to store the 64-bit interger.Other wise, use the lowPart and HighPart to store the 64-bit integer.
So in this example, we compile it in 64-bit, so just use the QuadPart.
QPC’s computational cost
The computational calling cost of QPC is determined primarily by the underlying hardware platform. If TSC Register is used as the basis for QPC, the computational cost is determinede primarily by how long the processor takes to process an RDTSC instruction.
If TSC can’t be used, the system will select a different hardware time basis.Because these time bases are located on the motherboard , the per-call computational cost is higher and is frequently in the vicinity of 0.8-1.0 microseconds.This cost is dominated by the time required to access the hardware device on the motherboard.
In some platforms that can’t use the TSC Register ,acquiring high resolution time stamps can be significantly more expensive, so try to use a lower resolution.If resolution of 10 to 16 milliseconds is sufficient, you can use GetTickCount64, QueryInterruptTime etc.
About QueryPerformanceFrequency
The frequency of the performance counter is fixed at system boot and is consistent across all processors soyou only need to query the frequency at the application initializes, and then cache the result.
Other Topics:
How to avoid loss of precision with coverting to double or float?
- Intger division will lose the remainder,So considering multiply something before doing devision.
- Conversion between 64-bit integers and floating point(double) can cause loss of precision because the floating point mantissa can’t represent all possible integral values.
- Multiplication of 64-bit integers can result in integer overflow.
As a general principle, delay these computations and conversions as long as possible to avoid compounding the errors introduced.
Low-Level hardware clock characteristics
Resolution, Precision, Accuracy and Stability
Basic concept
QPC uses a hardware counter as its basis, Hardware timers consist of three parts: a tick generator, a counter that counts the ticks ,and a means of retrieving the counter value. The characteristics of these threee componets determine the resolution, precision, accuracy and stability of QPC.
The rate at which the ticks are generated is called the frequency, and is expressed in Hertz(Hz). The reciprocal of the frequency is called the period or tick interval and is expressed in an appropriate International System of Units(SI) time unit(for example, millisecond, microsecond).

The resolution of the timer is wqual to the period.Resolution determines the ability to distinguish between any two time stamps and places a lower bound on the smallest time intervals that can be measured. This is sometimes called the tick resolution.
Digital measurement of time introduces a measurements uncertainty of ± 1 tick because the digital counter advances in discrete steps, while time is continuously advancing. This uncertainty is called a quantization error. For typical time-interval measurements, this effect can often be ignored because the quantizing error is much smaller than the time interval being measured.

However, if the period being measured is small and approaches the resolution of the timer, you will need to consider this quantizing error. The size of the error introduced is that of one clock period.
The following two diagrams illustrate the impact of the ± 1 tick uncertainty by using a timer with a resolution of 1 time unit.

Caculation:
We get the Freqency, so:
Tick Interval =1/Frequency; Elapsed time = Ticks * Tick Interval
Precision
It takes time to read the tick counter from software, and this access time can reduce the precision of the time measure ment. This is because the minimum interval time (the smallest time interval that can be measured ) is the larger of the resolution and the access time.
Precision = MAX [Resolution, AccessTime]
QPCAccessTime

Understand the capabilities and limitations of QPC
QPC uses a hardware counter, and the most commonly used hardware tick generator is a crstal oscillator. The accuracy of a timer refers to the degreee of conformity to a true or standard value. This depends primarily on the crystal oscillator’s ablity to provide ticks at the specified frequency.
If the frequency is too high, then the clock will ‘run fast’, and measured intervals will appear longger than they really are; and if the frequency is too low, the clock will ‘run slow’, and measured intervals will appear shorter than they really are.
The crystal’s frequency of oscillation is set during the manufacturing process and is specified by the manutacturer in terms of a specified frequency plus of minus a manufacturing tolerance expressed in ‘parts per million’(ppm),called the maximum frequency offset. For example, 1,000,000 Hz and a ± 10 ppm would be within 999,990 Hz and 1,000,010 Hz.
By substituting the phrase parts per million with microseconds per second, we can apply this frequency offset error to time-interval measurements. An oscillator with a + 10 ppm offset would have an error of 10 microseconds per second.
10 ppm -> 10 Hz per million -> 10 / 1,000,000 -> 10 microseconds
For small time intervals, the frequency offset error can often be ignored. For long time intervals, even a small frequency offset can result in substantial measurement uncertainty.

By now we have reached day010, In this doc I didn’t include any of our handmade code. Because I think it’s easy to write when you understand those concepts. This doc is mainly refer to a Microsoft Article and I strongly recommend you to refer it when you have questions.
Thank you for reading!~