Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow for AbstractVector #29

Merged
merged 13 commits into from
Aug 24, 2016
Merged

Allow for AbstractVector #29

merged 13 commits into from
Aug 24, 2016

Conversation

KristofferC
Copy link
Owner

Allow for entering a vector of points where a point is an instance of an AbstractVector. Tests should pass except the DataFreeTree stuff.

Need to benchmark things to see that things doesnt slow down.

Help in testing / benchmarking appreciated.

Should fix: #24, #17, #13

@KristofferC
Copy link
Owner Author

CC @andyferris

@@ -1,11 +1,12 @@
__precompile__()
# __precompile__()
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

StaticArrays should now be precompiled, if that was the problem here (use 0.0.4 in the REQUIRE file above)

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, it is just commented out because it is a WIP so I dont have to wait for recompilation :)

@andyferris
Copy link
Contributor

Looks like a nice generalization, @KristofferC.

So I've managed to get it working with inrange on KDTree but there isn't really any speed advantage at the moment. One minor improvement (for the way we are doing things) is have a single point version of inrange without making another allocation...

To make a large improvement to the timings I think we will want more parts of the code, like HyperRectangle, to use SVector. I'll have a play!

ndim_tree = size(tree.data, 1)
if ndim_points != ndim_tree
function check_input{V1, V2}(tree::NNTree{V1}, points::Vector{V2})
if length(V1) != length(V2)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here and elsewhere you are allowing V to be V <: AbstractVector, and then calling length() on the type. Do you think V <: StaticVector is more appropriate?

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not really since someone might want to use a FixedSizeArray for example. I don't want to bind V too much, I rather specify an interface that V has to follow (like definining length and eltype).

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, that's true! (Except FixedSizeArray isn't a subtype of AbstractArray).

Copy link
Owner Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah hehe. But still :P

@KristofferC
Copy link
Owner Author

I am not surprised that there is not much speed improvement (for now I'm glad if there aren't any regression). The majority of the time is spent in fetching data from memory and I have already been careful optimizing that as much as I can. What this PR brings is more of a generalization to have the possibility to use a vector of points instead of a matrix.

The only computations that are done on the hyper rectangle is in the creation of the tree. Otherwise, only computations in the dimension that the points got split are performed

split_diff_pow = eval_pow(M, split_diff)

@andyferris
Copy link
Contributor

Right, makes sense! Thanks

do_return_inrange(idxs, ::AbstractVector) = idxs[1]
do_return_inrange(idxs, ::AbstractMatrix) = idxs
function inrange{V, T <: Number}(tree::NNTree{V}, point::AbstractVector{T}, radius::Number, sortres=false)
idxs = inrange(tree, Vector{T}[point], radius, sortres)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should be Vector{typeof(point)}[point].

Also, this one seems a little wasteful. We could copy the kernel from the above function (lines 21-29) - measured about a 10% speedup.

Copy link
Owner Author

@KristofferC KristofferC Aug 15, 2016

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, there are some optimizations to be made with regards to only querying for 1 point at a time. Could also be good to provide an API where everything is preallocated, like the vector for storing indices and distances.

The easiest thing is probably to split up the knn function into an outer part and an inner part where the inner only takes a single point.

@KristofferC
Copy link
Owner Author

I updated this. It should deal with single point query better now. Feel free to try it out. :)

@KristofferC KristofferC mentioned this pull request Aug 23, 2016
@KristofferC
Copy link
Owner Author

Some updates: Benchmarking is looking great:

https://gist.github.com/KristofferC/d3666e6428a473ea93ccfea396391ab3

@KristofferC
Copy link
Owner Author

I think changing the HyperSpheres from each having their own Vector to having SVector instead did great for cache locality.

@codecov-io
Copy link

codecov-io commented Aug 24, 2016

Current coverage is 93.61% (diff: 94.41%)

Merging #29 into master will increase coverage by 9.48%

@@             master        #29   diff @@
==========================================
  Files            14         14          
  Lines           523        517     -6   
  Methods           0          0          
  Messages          0          0          
  Branches          0          0          
==========================================
+ Hits            440        484    +44   
+ Misses           83         33    -50   
  Partials          0          0          

Powered by Codecov. Last update d8e718c...7043829

@KristofferC KristofferC merged commit 815f96a into master Aug 24, 2016
@andyferris
Copy link
Contributor

OK, I had a look at those benchmark comparisons,. and it really seems to have helped BallTree which is great!

I've been using knn on KDTree which is where I saw no improvement. Oh well :)

How efficient is BallTree in general? When should I consider using it? We have 3D point clouds and I thought knowing about the geometry (3D space) would make it most efficient.

@KristofferC
Copy link
Owner Author

BallTree is not as heavily optimized as KDTree but should in theory be faster for higher dimensions. And it also works with more general metrics than Minkowski. If you run the benchmarks and run the markdown generator on the resulting file you can compare absolute values between performance of the trees. If your data is biased some way ypu should probably benchmark with that type of data.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

More generic data
3 participants