Move all converters to starch-based implementations (#97)
* Switch all conversion routines to use starch.
main user-visible changes:
* ensure you check out submodules ('git clone --recurse-submodules")
* --version shows the CPU features and DSP implementations in use
* --wisdom allows overriding of the built-in architecture wisdom
* --dcfilter no longer supported
* "starch-benchmark" binary will benchmark all options on the
current machine and can produce a wisdom file to feed to
the --wisdom option
If you have a usecase for --dcfilter, please get in touch and
let me know - it's an edge case and for now there's no starch/DSP
support for it, but support can be written if needed.
In almost all cases the new conversion routines are slightly or
substantially faster than the old conversion routines. The only case
that is slower is SC16/SC16Q11 on a Pi 0, which is around 10% slower
due to changing from heavily approximated lookup tables to higher
quality results (but SC16 is probably already out of reach of a Pi 0)
* No need to build with SC16Q11_TABLE_BITS any more
* Add oneoff/uc8_capture_stats
(reads a UC8 capture; measures min/max/mean I and Q)
* Switch UC8 conversion to 127.4 center, 128 range.
Looking at actual UC8 captures from a RTL2832, the mean I and Q
are actually at 127.4, so use that as the zero point.
This means that the resulting I/Q maximum values could be as large as
127.6. Switch to 128 for simplicity.
* Switch to the new UC8 zero offset in benchmarks, fix some bugs
* Fix some bugs in SC16/SC16Q11 validation, tighten the max error requirements
* Ditch UC8 approximation path, add a NEON VRQSQRTE path.
* Tweak the SC16 exact path, add a new impl that uses a mix of
u32 & floats.
* SC16Q11 impl tweaks:
* add a u32->float exact path
* ditch the approximation path
* add a NEON VRSQRTE path
* add a 12-bit table path (using the full signed I/Q value, not absolute value)
* Ditch SC16 approximation path, add NEON vrsqrte path
* Add oneoff/dsp_error_measurement
This runs sample input through the DSP functions that are
allowed to be inexact and dumps the results as a TSV suitable for
feeding to gnuplot to look at the actual errors.
* Update make clean, make wisdom targets
* Update wisdom based on benchmarking
* Preserve the raw wisdom benchmark data
* Update to latest starch
* Update .gitignore for new wisdom files
* Update starch generated code
* Build starch-benchmark as part of the 'all' target
* Use wisdom from /etc/dump1090-fa/wisdom.local if present
* Package starch-benchmark and a helper script to generate local wisdom data
* Remove submodules in preparation for importing them directly
* Import cpu_features v0.6.0 from https://github.com/google/cpu_features/releases/tag/v0.6.0
* Import starch at commit a725c8491dc33a321565d451b385131e589d8490
from https://github.com/flightaware/starch
2021-01-21 11:45:00 +00:00
|
|
|
# Hardware : BCM2711
|
|
|
|
|
# Revision : a03111
|
|
|
|
|
# Model : Raspberry Pi 4 Model B Rev 1.1
|
|
|
|
|
#
|
|
|
|
|
# "performance" cpufreq governor @ 1.5GHz
|
|
|
|
|
|
|
|
|
|
# generated by ./starch-benchmark -i 15 -r wisdom.local -o wisdom.local
|
|
|
|
|
|
|
|
|
|
magnitude_power_uc8 neon_vrsqrte_armv7a_neon_vfpv4 # 225494 ns/call
|
|
|
|
|
magnitude_power_uc8 twopass_generic # 232985 ns/call
|
|
|
|
|
magnitude_power_uc8 twopass_armv7a_neon_vfpv4 # 233043 ns/call
|
|
|
|
|
magnitude_power_uc8 lookup_generic # 312890 ns/call
|
|
|
|
|
magnitude_power_uc8 lookup_armv7a_neon_vfpv4 # 313395 ns/call
|
|
|
|
|
magnitude_power_uc8 lookup_unroll_4_armv7a_neon_vfpv4 # 351108 ns/call
|
|
|
|
|
magnitude_power_uc8 lookup_unroll_4_generic # 392295 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_power_uc8_aligned neon_vrsqrte_armv7a_neon_vfpv4 # 212203 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned neon_vrsqrte_armv7a_neon_vfpv4_aligned # 212204 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned twopass_armv7a_neon_vfpv4_aligned # 232057 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned twopass_armv7a_neon_vfpv4 # 232072 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned twopass_generic # 232141 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned lookup_generic # 304510 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned lookup_armv7a_neon_vfpv4_aligned # 304855 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned lookup_armv7a_neon_vfpv4 # 304863 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned lookup_unroll_4_armv7a_neon_vfpv4 # 332848 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned lookup_unroll_4_armv7a_neon_vfpv4_aligned # 333134 ns/call
|
|
|
|
|
magnitude_power_uc8_aligned lookup_unroll_4_generic # 377063 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_sc16 neon_vrsqrte_armv7a_neon_vfpv4 # 685671 ns/call
|
|
|
|
|
magnitude_sc16 exact_u32_armv7a_neon_vfpv4 # 2471841 ns/call
|
|
|
|
|
magnitude_sc16 exact_float_armv7a_neon_vfpv4 # 2488725 ns/call
|
|
|
|
|
magnitude_sc16 exact_u32_generic # 3475780 ns/call
|
|
|
|
|
magnitude_sc16 exact_float_generic # 3627016 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_sc16_aligned neon_vrsqrte_armv7a_neon_vfpv4_aligned # 645434 ns/call
|
|
|
|
|
magnitude_sc16_aligned neon_vrsqrte_armv7a_neon_vfpv4 # 646233 ns/call
|
|
|
|
|
magnitude_sc16_aligned exact_u32_armv7a_neon_vfpv4 # 2464487 ns/call
|
|
|
|
|
magnitude_sc16_aligned exact_u32_armv7a_neon_vfpv4_aligned # 2464639 ns/call
|
|
|
|
|
magnitude_sc16_aligned exact_float_armv7a_neon_vfpv4_aligned # 2489450 ns/call
|
|
|
|
|
magnitude_sc16_aligned exact_float_armv7a_neon_vfpv4 # 2495798 ns/call
|
|
|
|
|
magnitude_sc16_aligned exact_u32_generic # 3473976 ns/call
|
|
|
|
|
magnitude_sc16_aligned exact_float_generic # 3629034 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_sc16q11 neon_vrsqrte_armv7a_neon_vfpv4 # 166102 ns/call
|
|
|
|
|
magnitude_sc16q11 exact_u32_armv7a_neon_vfpv4 # 615312 ns/call
|
|
|
|
|
magnitude_sc16q11 exact_float_armv7a_neon_vfpv4 # 822023 ns/call
|
|
|
|
|
magnitude_sc16q11 exact_u32_generic # 1151805 ns/call
|
|
|
|
|
magnitude_sc16q11 exact_float_generic # 1218908 ns/call
|
|
|
|
|
magnitude_sc16q11 11bit_table_armv7a_neon_vfpv4 # 1940816 ns/call
|
|
|
|
|
magnitude_sc16q11 12bit_table_armv7a_neon_vfpv4 # 2035932 ns/call
|
|
|
|
|
magnitude_sc16q11 12bit_table_generic # 2401932 ns/call
|
|
|
|
|
magnitude_sc16q11 11bit_table_generic # 2656593 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_sc16q11_aligned neon_vrsqrte_armv7a_neon_vfpv4 # 155218 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned neon_vrsqrte_armv7a_neon_vfpv4_aligned # 155242 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned exact_u32_armv7a_neon_vfpv4 # 612259 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned exact_u32_armv7a_neon_vfpv4_aligned # 612269 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned exact_float_armv7a_neon_vfpv4_aligned # 815733 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned exact_float_armv7a_neon_vfpv4 # 821729 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned exact_u32_generic # 1154414 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned exact_float_generic # 1224252 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned 11bit_table_armv7a_neon_vfpv4 # 1940788 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned 12bit_table_armv7a_neon_vfpv4_aligned # 2035889 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned 12bit_table_armv7a_neon_vfpv4 # 2036579 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned 11bit_table_armv7a_neon_vfpv4_aligned # 2077521 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned 12bit_table_generic # 2405119 ns/call
|
|
|
|
|
magnitude_sc16q11_aligned 11bit_table_generic # 2657152 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_uc8 neon_vrsqrte_armv7a_neon_vfpv4 # 188739 ns/call
|
|
|
|
|
magnitude_uc8 lookup_unroll_4_generic # 284930 ns/call
|
|
|
|
|
magnitude_uc8 lookup_armv7a_neon_vfpv4 # 291956 ns/call
|
|
|
|
|
magnitude_uc8 lookup_generic # 292047 ns/call
|
|
|
|
|
magnitude_uc8 lookup_unroll_4_armv7a_neon_vfpv4 # 298012 ns/call
|
|
|
|
|
magnitude_uc8 exact_armv7a_neon_vfpv4 # 921119 ns/call
|
|
|
|
|
magnitude_uc8 exact_generic # 1676587 ns/call
|
|
|
|
|
|
|
|
|
|
magnitude_uc8_aligned neon_vrsqrte_armv7a_neon_vfpv4 # 187202 ns/call
|
|
|
|
|
magnitude_uc8_aligned neon_vrsqrte_armv7a_neon_vfpv4_aligned # 187203 ns/call
|
|
|
|
|
magnitude_uc8_aligned lookup_unroll_4_generic # 280048 ns/call
|
|
|
|
|
magnitude_uc8_aligned lookup_armv7a_neon_vfpv4_aligned # 282247 ns/call
|
|
|
|
|
magnitude_uc8_aligned lookup_generic # 282254 ns/call
|
|
|
|
|
magnitude_uc8_aligned lookup_armv7a_neon_vfpv4 # 282262 ns/call
|
|
|
|
|
magnitude_uc8_aligned lookup_unroll_4_armv7a_neon_vfpv4_aligned # 292923 ns/call
|
|
|
|
|
magnitude_uc8_aligned lookup_unroll_4_armv7a_neon_vfpv4 # 292985 ns/call
|
|
|
|
|
magnitude_uc8_aligned exact_armv7a_neon_vfpv4 # 921141 ns/call
|
|
|
|
|
magnitude_uc8_aligned exact_armv7a_neon_vfpv4_aligned # 921149 ns/call
|
|
|
|
|
magnitude_uc8_aligned exact_generic # 1676551 ns/call
|
|
|
|
|
|
|
|
|
|
mean_power_u16 u32_armv7a_neon_vfpv4 # 45483 ns/call
|
|
|
|
|
mean_power_u16 neon_float_armv7a_neon_vfpv4 # 58654 ns/call
|
|
|
|
|
mean_power_u16 u64_armv7a_neon_vfpv4 # 79486 ns/call
|
|
|
|
|
mean_power_u16 float_armv7a_neon_vfpv4 # 94322 ns/call
|
|
|
|
|
mean_power_u16 u64_generic # 131666 ns/call
|
|
|
|
|
mean_power_u16 u32_generic # 132124 ns/call
|
|
|
|
|
mean_power_u16 float_generic # 187161 ns/call
|
|
|
|
|
|
|
|
|
|
mean_power_u16_aligned u32_armv7a_neon_vfpv4_aligned # 44929 ns/call
|
|
|
|
|
mean_power_u16_aligned u32_armv7a_neon_vfpv4 # 44933 ns/call
|
|
|
|
|
mean_power_u16_aligned neon_float_armv7a_neon_vfpv4 # 58485 ns/call
|
|
|
|
|
mean_power_u16_aligned neon_float_armv7a_neon_vfpv4_aligned # 58488 ns/call
|
|
|
|
|
mean_power_u16_aligned u64_armv7a_neon_vfpv4 # 80349 ns/call
|
|
|
|
|
mean_power_u16_aligned u64_armv7a_neon_vfpv4_aligned # 80669 ns/call
|
|
|
|
|
mean_power_u16_aligned float_armv7a_neon_vfpv4_aligned # 86325 ns/call
|
|
|
|
|
mean_power_u16_aligned float_armv7a_neon_vfpv4 # 86326 ns/call
|
|
|
|
|
mean_power_u16_aligned u64_generic # 131637 ns/call
|
|
|
|
|
mean_power_u16_aligned u32_generic # 132092 ns/call
|
|
|
|
|
mean_power_u16_aligned float_generic # 187127 ns/call
|
2021-07-08 10:53:02 +00:00
|
|
|
|
|
|
|
|
count_above_u16 neon_armv7a_neon_vfpv4 # 35 ns/call
|
|
|
|
|
count_above_u16 generic_armv7a_neon_vfpv4 # 56 ns/call
|
|
|
|
|
count_above_u16 generic_generic # 178 ns/call
|
|
|
|
|
|
|
|
|
|
count_above_u16_aligned neon_armv7a_neon_vfpv4_aligned # 34 ns/call
|
|
|
|
|
count_above_u16_aligned neon_armv7a_neon_vfpv4 # 34 ns/call
|
|
|
|
|
count_above_u16_aligned generic_armv7a_neon_vfpv4_aligned # 53 ns/call
|
|
|
|
|
count_above_u16_aligned generic_armv7a_neon_vfpv4 # 53 ns/call
|
|
|
|
|
count_above_u16_aligned generic_generic # 179 ns/call
|