-
Notifications
You must be signed in to change notification settings - Fork 368
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Unit testing overhaul - new algorithm and tolerance proposal #1309
Comments
As a follow up regarding other unit testing "issues", I would like to further address the library's default error tolerance as well as how unit test comparison values are represented. Inaccurate
This error compounds with the error tolerance of the library (default 0.001%). That measurement alone would move the library to almost 0.0015% error, 50% more than currently stated. On top of this issue lies an inherent issue with the unit testing. Unit conversion regularly uses unsavory conversion rates, some even resulting in repeating decimals. If you were to give me two similar numbers above and ask me which one was more precise, I would select the "Current" one because precision is sometimes perceived as conciseness. In #1308, the unit test values are represented as equations to prevent the above concise/precise issue from happening for maintainers. I understand it is easier to copy/paste the values into Wolfram Alpha, or similar, but if a unit test is under scrutiny, I believe it should require more time to evaluate it. My proposal around all of this, even if the comparison method isn't changed from percent error (not sure it should be), is that library wide precision needs to be increased and the approach for "acceptable" unit test values should be re-evaluated. Take, for example, the #1308 PR that modifies the unit conversion rates towards the real-world values. Based on some quick tests, round-trip conversions pass a But to talk about the elephant in the room, many of the conversion rates are not precise and are good enough for most use cases. In those cases, I believe the tolerance of those units should be required to be overridden (change Ultimately, you're the maintainers and I will respect your final decision on this but wanted to provide my feedback beforehand. Worst case scenario, our application can create its own, similar unit testing that requires tighter tolerances. |
A lot of good information here, but there is a lot of text to digest and some ideas in several directions. Can you help me understand better your proposal, by summarizing your exact proposed changes and any open questions to discuss, in a bullet list? If I read it right, there are two ideas that each require a summary of changes/discussions:
|
Sorry, I can get a bit carried away at times. I think you have the gist of it, though. I hope this helps clarify things, or at least makes referencing the different points easier. Note: I'm still not convinced that one algorithm is better than the other and the following is based on my observations and/or understandings. Comparison Algorithm
ConceptCompare the first Tighter Default TolerancesAdvertised as the "accuracy" of the library. Closely tied to
Note: Many conversion rate tolerances would need to be overridden to compensate for new default tolerances. Unit Testing ValuesThis is a bit more challenging and maybe should be a completely separate discussion. Essentially, unit testing should hold the JSON values to a very high standard so that if someone finds a "more precise" conversion rate, it may force them to question the change if the unit test then fails. I believe this is only achievable by using the highest precision values for the unit test. For comparison, the top is what I propose. The bottom is what is currently recommended: - protected override double ShortTonsForceInOneNewton => 1.12404471549816e-4;
+ protected override double ShortTonsForceInOneNewton => 1.0 / (PoundsConversionRate * 2_000.0); - protected override double ShortTonsForceInOneNewton => 1.12404471549816e-4;
+ protected override double ShortTonsForceInOneNewton => 1.124044715498552414550197e-4; Readability takes a hit by using a double literal as it's unclear where it came from (unless specified in a comment). Truncating the value reduces the possible precision, which should be avoided. I believe the most important question to answer here is, "What is the purpose of the round-trip conversion unit tests?" Remember, |
I will try to find time to get back to this sometime soon. |
Is your feature request related to a problem? Please describe.
I originally thought the existing method of unit testing round trips of conversion rates via percent error comparison were inefficient. However, the more time I have spent thinking about this issue and working on an alternative, the less I know what the actual issue is.
Describe the solution you'd like
Personally, I like the significant figures approach over the percent error approach. I'm struggling to justify one approach over the other though.
Describe alternatives you've considered
Minimally, increase the default accepted percent error from
1e-5
. Ideally, to1e-9
.Additional context
I was initially convinced that the percent error comparison was inadequate, however I'm no longer convinced. However, I don't want my efforts to look at an alternative to go to waste so I'm presenting my thoughts and potential solution for your review and feedback.
Edit: Two "advantages" I thought about with significant figure comparison is that the output can be displayed "lined up" to quickly identify the difference and the arbitrary percent error that is selected would be replaced by an integer.
The below algorithm needs to be provided the expected and actual values as well as the number of significant digits to compare. The expected value's log10 is used to determine what the allowed difference can be. The difference ends up being calculated as
(power + 1) / 2
, so themaxDelta
effectively allowsactual
to be+/- 0.5
, relative to the significant digit's position.The only use case that I've been able to identify as being "better" than percent error comparison is when
expected
is0.0
. With the percent error comparison, there is no tolerance, and the actual value must be0.0
. With the significant figures approach, n significant figures translate to expected +/- 10-n + 0.5.Note: After further review, this is only because a special use case was added to handle zero a certain way, as if the ones place was a significant digit.
Something to keep in mind, the expected precision for a round trip calculation should only be concerned with float error, assuming the to/from equations are inverse operations. However, the operations are not always inverse operations. This translates into a tolerance that should be stricter but must still have some leniency. Additionally, higher round trip precision should be encouraged as all calculations are relative to base unit conversions.
I was concerned with this new method being significantly slower due to additional math being performed so I ran some benchmarks. Even though an increase in time isn't as relevant for the testing library, there are a lot of tests to run, and this change shouldn't be impactful to the overall testing time. It appears this new method is roughly 3.5x slower than the current method. I would deem this to be acceptable, but that is not my call.
Note: Upon running multiple benchmarks, the results varied significantly.See below for details regarding performance benchmarking.Edit: Upon further investigation, it turns out I overlooked the following warnings.
Effectively, this means the times are so small, Benchmark.NET is unable to provide accurate results. So I ran it again, iterating a million times and removed the
Debug.Assert
call (I don't benchmark often), and got the below results. They're effectively equivalent.The text was updated successfully, but these errors were encountered: