Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

burn-import initializer tensor not added to scope #1882

Open
mosure opened this issue Jun 12, 2024 · 16 comments
Open

burn-import initializer tensor not added to scope #1882

mosure opened this issue Jun 12, 2024 · 16 comments
Labels
bug Something isn't working onnx

Comments

@mosure
Copy link
Contributor

mosure commented Jun 12, 2024

Describe the bug
nodes with input tensors of category initializer fail to import due to initializer tensor names not being in scope

To Reproduce
import a minimal onnx graph /w initializer tensor, e.g.

class Model(nn.Module):
    def __init__(self):
        super(Model, self).__init__()

    def forward(self, x):
        x = F.interpolate(x, size=(2, 2), mode='bilinear', align_corners=True)
        return x

image
image

Expected behavior
burn-import generated model contains tensor initializers, e.g. _Concat_output_0 is defined in the scope

Actual behavior
burn import fails with _Concat_output_0 is not in scope, panic at this line:

panic!("No variable with name {}", tensor.name);

Desktop (please complete the following information):

  • OS: [Windows]

Additional context
this occurs in resize, gather, mul_scalar, and sub_scalar node imports as well (with initializer tensors)

@mosure mosure changed the title burn-import initializer tensor's not added to scope burn-import initializer tensor not added to scope Jun 12, 2024
@antimora
Copy link
Collaborator

@skewballfox, do you think this is related to #1857?

@antimora antimora added bug Something isn't working onnx labels Jun 12, 2024
@skewballfox
Copy link
Contributor

was this encountered with the version of burn last published in crates.io or on main?

also, I'm a bit confused. is /Concat_output_0 both a node output and an initializer?

@mosure
Copy link
Contributor Author

mosure commented Jun 12, 2024

this is encountered on main.

here is a more minimal reproducible example which clarifies the /Concat_output_0 initializer/output ambiguity:

image

python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as an initializer
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_initializer = numpy_helper.from_array(sizes, name="sizes")

    resize_node = helper.make_node(
        "Resize",
        name="resize_node",
        inputs=["input_tensor", "", "", "sizes"],
        outputs=["output"],
        mode="linear",
    )

    graph_def = helper.make_graph(
        nodes=[resize_node],
        name="ResizeGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 2])
        ],
        initializer=[sizes_initializer],
    )

    model_def = helper.make_model(graph_def, producer_name="resize")

    onnx.save(model_def, "resize.onnx")


if __name__ == "__main__":
    main()

@mosure
Copy link
Contributor Author

mosure commented Jun 12, 2024

the same scope import error happens with constant input tensors too, e.g.

image

python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as a constant node
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_const_node = helper.make_node(
        "Constant",
        name="sizes_const",
        inputs=[],
        outputs=["sizes"],
        value=numpy_helper.from_array(sizes)
    )

    resize_node = helper.make_node(
        "Resize",
        name="resize_node",
        inputs=["input_tensor", "", "", "sizes"],
        outputs=["output"],
        mode="linear",
    )

    graph_def = helper.make_graph(
        nodes=[sizes_const_node, resize_node],
        name="ResizeGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 2])
        ],
        initializer=[],
    )

    model_def = helper.make_model(graph_def, producer_name="resize")

    onnx.save(model_def, "resize.onnx")


if __name__ == "__main__":
    main()

@skewballfox
Copy link
Contributor

@antimora I think this might not be related to lifting intializers or constants
here's some information from the debug print of resize.onnx prior to any editing:

model_proto: ModelProto {
...
                    NodeProto {
                        input: [
                            "input_tensor",
                            "",
                            "",
                            "sizes",
                        ],
                        output: [
                            "output",
                        ],
                        name: "resize_node",
...

sizes is in the list of initializers.

the two fields with "" are roi and scales. the two solutions that come to mind is create a dummy entry for empty strings, or rework the api for the functions that might have empty strings as inputs to create a default or None input arg.

@mosure
Copy link
Contributor Author

mosure commented Jun 12, 2024

a minimal gather import with Initializer indices also has a scope panic. gather imports correctly with Constant indices.

I don't think gather has the same empty input names as resize: https://onnx.ai/onnx/operators/onnx__Gather.html

image

broken initializer gather python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/gather/gather.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the indices tensor as an initializer
    indices = np.array([0, 2], dtype=np.int64)
    indices_initializer = helper.make_tensor(
        name="indices",
        data_type=TensorProto.INT64,
        dims=indices.shape,
        vals=indices,
    )

    gather_node = helper.make_node(
        "Gather",
        name="gather_node",
        inputs=["input_tensor", "indices"],
        outputs=["output"],
        axis=2,  # Assuming we want to gather along the height dimension (axis 2)
    )

    graph_def = helper.make_graph(
        nodes=[gather_node],
        name="GatherGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 4])
        ],
        initializer=[indices_initializer],
    )

    model_def = helper.make_model(graph_def, producer_name="gather")

    onnx.save(model_def, "gather.onnx")


if __name__ == "__main__":
    main()

image

working constant gather python
#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/gather/gather.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the indices tensor as a constant node
    indices = np.array([0, 2], dtype=np.int64)
    indices_const_tensor = helper.make_tensor(
        name="indices_const_tensor",
        data_type=TensorProto.INT64,
        dims=indices.shape,
        vals=indices,
    )

    indices_const_node = helper.make_node(
        "Constant",
        name="indices_const_node",
        inputs=[],
        outputs=["indices"],
        value=indices_const_tensor,
    )

    gather_node = helper.make_node(
        "Gather",
        name="gather_node",
        inputs=["input_tensor", "indices"],
        outputs=["output"],
        axis=2,  # Assuming we want to gather along the height dimension (axis 2)
    )

    graph_def = helper.make_graph(
        nodes=[indices_const_node, gather_node],
        name="GatherGraph",
        inputs=[input_tensor],
        outputs=[
            helper.make_tensor_value_info("output", TensorProto.FLOAT, [1, 1, 2, 4])
        ],
        initializer=[],
    )

    model_def = helper.make_model(graph_def, producer_name="gather")

    onnx.save(model_def, "gather.onnx")


if __name__ == "__main__":
    main()

I'd be curious to know more about the solution for dummy entries of empty input names, what mechanism of burn_import is causing empty names to be ignored?

@skewballfox
Copy link
Contributor

I'd be curious to know more about the solution for dummy entries of empty input names, what mechanism of burn_import is causing empty names to be ignored?

When we initialize nodes from the proto data all input arguments (which are simply strings, the original names of the arguments) pass through this function. All that get stored in the inputs is the original graph inputs and the node outputs after it has been processed.

I was sort of wrong in my initial assumption: I assumed using an empty string for a HashMap key wouldn't work, but I just tried it on the rust playground and its allowed. I think we still need to check for empty string names or some placeholder to indicate to use a default value though.

Honestly I'm kind of stumped. I'm going to recreate the gather.onnx locally tomorrow.

@skewballfox
Copy link
Contributor

I thought the issue with the initializers might be that the original rename input function renamed arguments to empty string, so maybe there was some "if empty inline tensor" check happening, though when I tried that with size on resize.onnx, I ran into a different error: Tensor of Kind Int with dim shape None was passed with empty name .

whatever's happening only with initializers, but not other generated inputs, and not when those initializers are graph inputs.

@skewballfox
Copy link
Contributor

the same scope import error happens with constant input tensors too, e.g.

...

#!/usr/bin/env python3

# used to generate model: onnx-tests/tests/resize/resize.onnx

import onnx
from onnx import helper, TensorProto, numpy_helper
import numpy as np

def main() -> None:
    input_tensor = helper.make_tensor_value_info("input_tensor", TensorProto.FLOAT, [1, 1, 4, 4])

    # Define the sizes tensor as a constant node
    sizes = np.array([1, 1, 2, 2], dtype=np.int64)
    sizes_const_node = helper.make_node(
        "Constant",
        name="sizes_const",
        inputs=[],
        outputs=["sizes"],
        value=numpy_helper.from_array(sizes)
    )
...

Looking at how that constant is defined, that will be lifted. So at the time scope is checked it will only exist as an input for resize.

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@antimora
Copy link
Collaborator

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@skewballfox, you must be referring to ONNX Constant. Netron does not show individual Constant node if it's referenced by a single Node. ONNX import supports constants (see code). The constant value is stored as an attribute of the model module struct. It's a Param without gradients.

@skewballfox
Copy link
Contributor

Do you guys off the top of your head know burn-graph side what is supposed to happen to input args with no parents? are they inlined, stored as variables?

@skewballfox, you must be referring to ONNX Constant. Netron does not show individual Constant node if it's referenced by a single Node. ONNX import supports constants (see code). The constant value is stored as an attribute of the model module struct. It's a Param without gradients.

This wouldn't be tied to a constant node because that constant is lifted, and the parent node is deleted.

I was more wondering about the input arguments to nodes that aren't graph inputs or node outputs, so lifted constants (where the original node is deleted like above), initializers, generated arguments.

The reason I ask is because this seems to be triggering when the argument only exist as an input, which is weird because scope is about determining ownership, but these args only exist in one place

@mosure
Copy link
Contributor Author

mosure commented Jun 13, 2024

adding this line to consume seems to fix the initializer import:
self.inputs.extend(self.initializers.values().cloned());

fn consume(mut self) -> (Vec<Node>, Vec<Argument>, Vec<Argument>) {


OnnxGraph::inputs doesn't capture the initializers.

this needs to contain the initializer tensor names when passed to register_input_output:

let input_names = self


edit: nvm, this results in the initializer being a parameter to forward:
pub fn forward(&self, input1: Tensor<B, 4>, indices: Tensor<B, 1, Int>) -> Tensor<B, 4>

the error seems near this pathway, e.g. node input initializers are not being managed properly by BurnGraph:

pub struct BurnGraph<PS: PrecisionSettings> {

@mosure
Copy link
Contributor Author

mosure commented Jun 13, 2024

if I convert the initializer to a constant and disable constant lifting, I run into an error where an Int constant tensor is added to the imported model, however, all of the Initializer::init* methods only support float tensors:

pub fn init<B: Backend, const D: usize, S: Into<Shape<D>>>(

let constant7: burn::module::Param<Tensor<B, 1, Int>> = burn::nn::Initializer::Zeros
            .init([1], device)
            .set_require_grad(false);
mismatched types
expected struct `Param<Tensor<B, burn::tensor::Int, _>>`
   found struct `Param<Tensor<_, burn::tensor::Float, _>>`

@antimora
Copy link
Collaborator

if I convert the initializer to a constant and disable constant lifting, I run into an error where an Int constant tensor is added to the imported model, however, all of the Initializer::init* methods only support float tensors:

pub fn init<B: Backend, const D: usize, S: Into<Shape<D>>>(

let constant7: burn::module::Param<Tensor<B, 1, Int>> = burn::nn::Initializer::Zeros
            .init([1], device)
            .set_require_grad(false);
mismatched types
expected struct `Param<Tensor<B, burn::tensor::Int, _>>`
   found struct `Param<Tensor<_, burn::tensor::Float, _>>`

CC @nathanielsimard and @laggui . Should we make the initializer more generic for other tensors?

@mosure
Copy link
Contributor Author

mosure commented Jun 13, 2024

here is some work towards generic initializers, it would need more careful handling:
https://github.com/mosure/burn/blob/8122e8b674fcad727c439d8c235b59fc7d7062e1/crates/burn-core/src/nn/initializer.rs#L79

@skewballfox
Copy link
Contributor

so I've been looking at output for cargo -r -- graph_path ./out for both the broken graphs and the graphs that I do know work.

  • Something interesting about the HashMap variables being checked is that it's only being added to during register_use

  • the way register_future_use is being called only on inputs, after register_use is called on graph inputs and node outputs.

  • These functions are being filtered to only look at the inputs that are tensors, probably because rust allows implicit copying of basic numeric types.

  • This doesn't trigger on reshape, but that's because it looks like reshape doesn't store it's rhs as a tensor. it's just a vector:

Reshape(
    ReshapeNode {
        input: TensorType {
            name: Ident(
                unsqueeze1_out1,
            ),
            dim: 5,
            kind: Float,
            shape: None,
        },
        output: TensorType {
            name: Ident(
                unsqueeze2_out1,
            ),
            dim: 6,
            kind: Float,
            shape: None,
        },
        shape: [
            1,
            1,
            3,
            4,
            5,
            1,
        ],
    },
)

I suspect that this is going to panic anytime we have a initializer or lifted constant that's typed as a tensor by burn graph. so far all the situations where I know constants are lifted in an op covered by test cases, the lifted values weren't tensors by this point. If you guys can think of a test case that has a lifted tensor for an op that takes a tensor argument, please let me know.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working onnx
Projects
None yet
Development

No branches or pull requests

3 participants