Daniel Lemire's blog

, 4 min read

Counting cycles and instructions on ARM-based Apple systems

6 thoughts on “Counting cycles and instructions on ARM-based Apple systems”

  1. Zhongpu says:

    I have some doubts about linux-perf-events.h.

    Question 1: it seems that ids is never used. Why not a local variable?

    for (auto config : config_vec) {
    attribs.config = config;
    fd = static_cast<int>(syscall(__NR_perf_event_open, &attribs, pid, cpu, group, flags));
    if (fd == -1) {
    report_error("perf_event_open");
    }
    ioctl(fd, PERF_EVENT_IOC_ID, &ids[i++]);
    if (group == -1) {
    group = fd;
    }
    }

    Question 2: how to understand “our actual results are in slots 1,3,5”?

    for (uint32_t i = 1; i < temp_result_vec.size(); i += 2) {
    results[i / 2] = temp_result_vec[i];
    }

    1. Pull requests invited !

  2. Quazi Irfan says:

    Why instruction count is double? Shouldn’t be a whole number?

    1. Yes. Of course, you can represent an integer using a floating-point number… which is convenient if you want to compute an average, for example.

      1. Quazi Irfan says:

        I was able to reproduce the measurement on my end using Dougall’s code. Thank you for linking it in your code.

        On x86, rdtsc starts during CPU power on, and keeps increasing. But it is not the case for Dougall’s code. I think the counting starts with the program, as I’ve noticed it start with a small number everytime. Do you concur? Here[1] is the snippet I am running. Example run,

        // start, stop, stop-start

        1363160, 6005521463, 6004158303

        6005554311, 12010098583, 6004544272

        12010107912, 18013040904, 6002932992

        18013048020, 24017295657, 6004247637

        24017306678, 30023735547, 6006428869

        30023751252, 36031294826, 6007543574

        36031304510, 42037625498, 6006320988

        42037635149, 48046499216, 6008864067

        48046506980, 54050169260, 6003662280

        54050174555, 60058717182, 6008542627

        It’s mostly copy paste of the original code. I intend to expose rdtsc function and call it from a python program.

        [1] https://gist.github.com/quazi-irfan/3ee4789e9752bc8b3b958300157235a5

        1. You should make sure you understand what rdtsc outputs on modern CPUs.