I'm sorry, that second reference was actually for the 3.5ULP variant. The 1 ULP is here: https://github.com/shibatch/sleef/blob/master/src/libm/sleef...
One of the ways that the classics can be improved is not to take the analytic ideal coefficients and approximate them to the closest floating point number, but rather take those ideal coefficients as a starting point for a search of slightly better ones.
The SLEEF Vectorized Math Library [1] does this and therefore can usually provide accuracy guarantees for the whole floating point range with a polynomial order lower than theory would predict.
Its asinf function [2] is accurate to 1 ULP for all single precision floats, and is similar to the `asin_cg` from the article, with the main difference the sqrt is done on the input of the polynomial instead of the output.
[1] https://sleef.org/ [2] https://github.com/shibatch/sleef/blob/master/src/libm/sleef...