Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Allow different types for the index storage #100

Open
CarpeNecopinum opened this issue Dec 17, 2019 · 0 comments
Open

Allow different types for the index storage #100

CarpeNecopinum opened this issue Dec 17, 2019 · 0 comments

Comments

@CarpeNecopinum
Copy link

Using NearestNeighbors.jl, I found that often times I need just the positions of the points and am not interested in the actual indices and/or I'm okay with reordering all of my dataset to suit the KDTree better (but the dataset itself being too large for me being comfortable with the KDTree storing a copy of it).

I can get that by building a tree, reordering my data and assembling a new tree with the reordered data and indices collect(eachindex(xs)):

tree = KDTree(xs, storedata = false)
xs .= xs[tree.indices]
tree1 = KDTree(xs, tree.hyper_rec, collect(eachindex(xs)), tree.metric, tree.nodes, tree.tree_data, true)

That however still leaves me with a quite large (and kind of useless) array of indices.

So I altered the KDTree to accept other types for indices:

struct KDTree{V <: AbstractVector,M <: MinkowskiMetric,T,TIV} <: NNTree{V,M}
    data::Vector{V}
    hyper_rec::HyperRectangle{T}
    indices::TIV
    metric::M
    nodes::Vector{KDNode{T}}
    tree_data::TreeData
    reordered::Bool
end

For one, this allows me to replace the Vector{Int} by a UnitRange{Int} which drastically cuts memory use.

Further more, I can use something like

struct IdentityArray end
Base.getindex(::IdentityArray, index) = index

tree3 = KDTree(xs, tree.hyper_rec, IdentityArray(), tree.metric, tree.nodes, tree.tree_data, true)

to get a good bit of extra performance as well (tested with 100M Vec3f0's):

@btime inrange(tree1, p, r) # ~ 38ms, tree with `Vector{Int}` for indices
@btime inrange(tree2, p, r) # ~ 34ms, tree with `eachindex(xs)` for indices
@btime inrange(tree3, p, r) # ~ 32ms, tree with `IdentityArray()` for indices

If desired, I can cook up a proper PR for this, just wanted to test the waters on how much interest for a change like this exists.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant