[BUG] crash when attempting to use MAC mps when wrapping PyTorch #3092

mytechnotalent · 2024-06-21T11:14:02Z

Bug description

When running train.mojo, we get the following crash.

Please submit a bug report to https://github.com/modularml/mojo/issues and include the crash backtrace along with all the relevant source codes.
Stack dump:
0.      Program arguments: mojo train.mojo
Stack dump without symbol names (ensure you have llvm-symbolizer in your PATH or set the environment var `LLVM_SYMBOLIZER_PATH` to point to it):
0  mojo                     0x0000000104448c24 llvm_strlcpy + 51480
1  mojo                     0x0000000104446f10 llvm_strlcpy + 44036
2  mojo                     0x00000001044492c4 llvm_strlcpy + 53176
3  libsystem_platform.dylib 0x000000018fc5f584 _sigtramp + 56
4  libtorch_python.dylib    0x00000001119f2d20 has_torch_function_attr(_object*) + 52
5  libtorch_python.dylib    0x00000001119a323c torch::is_tensor_and_append_overloaded(_object*, std::__1::vector<_object*, std::__1::allocator<_object*>>*) + 92
6  libtorch_python.dylib    0x00000001119a3c18 torch::FunctionParameter::check(_object*, std::__1::vector<_object*, std::__1::allocator<_object*>>&, int, long long*) + 560
7  libtorch_python.dylib    0x00000001119a5688 torch::FunctionSignature::parse(_object*, _object*, _object*, _object**, std::__1::vector<_object*, std::__1::allocator<_object*>>&, bool) + 536
8  libtorch_python.dylib    0x00000001119a67dc torch::PythonArgParser::raw_parse(_object*, _object*, _object*, _object**) + 108
9  libtorch_python.dylib    0x0000000111430bdc torch::autograd::THPVariable_linear(_object*, _object*, _object*) + 116
10 Python                   0x000000010f52ea0c cfunction_call + 72
11 Python                   0x000000010f4bfe18 _PyObject_MakeTpCall + 128
12 Python                   0x000000010f604ff8 _PyEval_EvalFrameDefault + 47004
13 Python                   0x000000010f4c3e70 method_vectorcall + 180
14 Python                   0x000000010f606d24 _PyEval_EvalFrameDefault + 54472
15 Python                   0x000000010f4c3e70 method_vectorcall + 180
16 Python                   0x000000010f606d24 _PyEval_EvalFrameDefault + 54472
17 Python                   0x000000010f4bfb9c _PyObject_FastCallDictTstate + 96
18 Python                   0x000000010f55aeac slot_tp_call + 208
19 Python                   0x000000010f4bfe18 _PyObject_MakeTpCall + 128
20 Python                   0x000000010f604ff8 _PyEval_EvalFrameDefault + 47004
21 Python                   0x000000010f4c3e70 method_vectorcall + 180
22 Python                   0x000000010f606d24 _PyEval_EvalFrameDefault + 54472
23 Python                   0x000000010f4c3e70 method_vectorcall + 180
24 Python                   0x000000010f606d24 _PyEval_EvalFrameDefault + 54472
25 Python                   0x000000010f4bfb9c _PyObject_FastCallDictTstate + 96
26 Python                   0x000000010f55aeac slot_tp_call + 208
27 Python                   0x000000010f4c0d64 _PyObject_Call + 164
28 Python                   0x000000030009858c _PyObject_Call + 8333916364
29 mojo                     0x00000001047dd530 __jit_debug_register_code + 1041480
30 mojo                     0x00000001043a956c
31 mojo                     0x00000001043a8f60
32 mojo                     0x0000000104391960
33 dyld                     0x000000018f8a60e0 start + 2360
mojo crashed!
Please file a bug report.
[73148:1793171:20240621,070907.221789:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.222027:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.222151:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.222275:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.222396:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.222517:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.222634:WARNING process_memory_mac.cc:93] mach_vm_read(0x107750000, 0x8000): (os/kern) invalid address (1)
[73148:1793171:20240621,070907.285115:WARNING in_range_cast.h:38] value -97304226 out of range
[73148:1793171:20240621,070907.288890:WARNING crash_report_exception_handler.cc:257] UniversalExceptionRaise: (os/kern) failure (5)
zsh: bus error  mojo train.mojo

net.mojo

from python import Python

struct Net:
    """
    Simple neural network for classification.

    Attributes:
        model: Sequential model containing layers of the network.
        device: Device to run the model on (e.g., 'mps' or 'cpu').
    """
    var model: PythonObject
    var device: PythonObject

    fn __init__(inout self):
        """
        Initializes the neural network layers.
        """
        try:
            var torch = Python.import_module("torch")
            var nn = torch.nn
            if torch.backends.mps.is_built():
                self.device = torch.device("mps")
            else:
                self.device = torch.device("cpu")
            self.model = nn.Sequential(
                nn.Linear(2, 5),
                nn.ReLU(),
                nn.Linear(5, 5),
                nn.ReLU(),
                nn.Linear(5, 2)
            ).to(self.device)
        except e:
            print("Error importing PyTorch: {e}")
            self.model = None
            self.device = None

    fn __copyinit__(inout self, other: Net):
        """
        Initializes a copy of Net from another instance.

        Args:
            other: Another instance of Net.
        """
        self.model = other.model
        self.device = other.device

    fn forward(self, x: PythonObject) raises -> PythonObject:
        """
        Defines the forward pass of the network.

        Args:
            x: Input tensor.

        Returns:
            Output tensor after passing through the network.
        """
        try:
            if x is None:
                raise ("Input tensor is None")

            var torch = Python.import_module("torch")
            if not torch.is_tensor(x):
                raise ("Input is not a valid tensor")

            var x_tensor = x.to(self.device) if x.device != self.device else x

            if x_tensor is None:
                raise ("Failed to move tensor to the correct device")

            return self.model(x_tensor)
        except e:
            raise ("Error during forward pass: {e}")

    fn backward(self, loss: PythonObject) raises:
        """
        Performs backward pass and updates gradients.

        Args:
            loss: Loss tensor calculated during forward pass.
        """
        try:
            loss.backward()
        except e:
            raise ("Error during backward pass: {e}")

    fn predict_probabilities(self, x: PythonObject) raises -> PythonObject:
        """
        Calculates class probabilities using softmax after forward pass.

        Args:
            x: Input tensor.

        Returns:
            Probability distribution over classes.
        """
        try:
            var torch = Python.import_module("torch")
            var F = torch.nn.functional
            if x is None:
                raise ("Input tensor is None")

            var x_tensor = torch.tensor(x, dtype=torch.float32).unsqueeze(0).to(self.device)

            if x_tensor is None:
                raise ("Failed to create tensor from input")

            var logits = self.model(x_tensor)
            var probabilities = F.softmax(logits, dim=1)
            return probabilities
        except e:
            raise ("Error calculating probabilities: {e}")

    fn predict_number(self, x: PythonObject) raises -> Int:
        """
        Predicts the class label using the trained model.

        Args:
            x: Input tensor.

        Returns:
            Predicted class label.
        """
        try:
            var torch = Python.import_module("torch")
            if x is None:
                raise ("Input tensor is None")

            var x_tensor = torch.tensor(x, dtype=torch.float32).unsqueeze(0).to(self.device)

            if x_tensor is None:
                raise ("Failed to create tensor from input")

            var logits = self.model(x_tensor)
            var prediction = logits.argmax(dim=1).item()  # Get the index of the max probability
            return prediction
        except e:
            raise ("Error during prediction: {e}")

train.mojo

from python import Python
from mojo.net import Net

fn main() raises:
    try:
        var torch = Python.import_module("torch")
        var sklearn = Python.import_module("sklearn.model_selection")
        var train_test_split = sklearn.train_test_split
        var optim = torch.optim
        var model = Net()
        var seed_value = 42
        torch.manual_seed(seed_value) 
        var input_data = torch.randn(64, 2)  # example input tensor with batch size 64 and input size 2
        var target_data = torch.randint(0, 2, (64,))  # example target tensor with batch size 64 and 2 classes
        var split_result = train_test_split(input_data, target_data, test_size=0.2, random_state=seed_value)
        var train_inputs = split_result[0]
        var test_inputs = split_result[1]
        var train_targets = split_result[2]
        var test_targets = split_result[3]
        var criterion = torch.nn.CrossEntropyLoss()
        var optimizer = optim.Adam(model.model.parameters(), lr=0.01)
        # training loop
        var num_epochs = 100
        for epoch in range(num_epochs):
            model.model.train()  # set the model to training mode
            optimizer.zero_grad()  # zero the gradients
            var output = model.forward(train_inputs)  # forward pass
            var loss = criterion(output, train_targets)  # calculate the loss
            model.backward(loss)  # backward pass
            optimizer.step()  # update weights
            print('epoch, loss:', epoch + 1, num_epochs, loss.item())
        torch.save(model.model.state_dict(), "model.pth")
        # evaluate the model on test data
        model.model.eval()  # set the model to evaluation mode
        var test_output = model.forward(test_inputs)  # forward pass on test data
        var test_loss = criterion(test_output, test_targets)  # calculate test loss
        print('test loss:', test_loss.item())

    except e:
        print("error during execution:", e)

Steps to reproduce

file structure

mojonet
  mojo
    __init__.mojo
    net.mojo
  train.mojo

mojo train.mojo

test_net.mojo

from python import Python
from mojo.net import Net 
from testing import assert_true

fn test_net_init() raises:
    """
    Test the initialization of the Net class.
    """
    var net: Net = Net()
    assert_true(net.model is not None, "Model should be initialized")

fn test_net_forward() raises:
    """
    Test the forward pass of the Net class.
    """
    var net: Net = Net()
    var torch: PythonObject = Python.import_module("torch")
    var train_inputs: PythonObject = torch.tensor([0.2656, -0.0026])
    var output: PythonObject = net.forward(train_inputs)
    assert_true(output is not None, "Forward pass should produce an output")

fn test_net_backward() raises:
    """
    Test the backward pass of the Net class.
    """
    var net: Net = Net()
    var torch: PythonObject = Python.import_module("torch")
    var nn: PythonObject = torch.nn
    var criterion: PythonObject = nn.MSELoss()
    var train_inputs: PythonObject = torch.tensor([0.0, 1.0])
    var train_targets: PythonObject = torch.tensor([0.5, -0.5])
    var output: PythonObject = net.forward(train_inputs)
    var loss: PythonObject = criterion(output, train_targets)
    var backward_output: PythonObject = net.backward(loss)
    assert_true(backward_output is None, "Backward pass should not produce an error")
    
fn test_net_predict_probabilities() raises:
    """
    Test the predict_probabilities method of the Net class.
    """
    var net: Net = Net()
    var probabilities: PythonObject = net.predict_probabilities([0.5, -0.5])
    assert_true(probabilities is not None, "Predict probabilities should produce an output")

fn test_net_predict_number() raises:
    """
    Test the predict_number method of the Net class.
    """
    var net: Net = Net()
    var prediction: Int = net.predict_number([0.5, -0.5])
    assert_true(0 <= prediction < 2, "Prediction should be between 0 and 1")

fn main() raises:
    try:
        test_net_init()
        test_net_forward()
        test_net_backward()
        test_net_predict_probabilities()
        test_net_predict_number()
    except e:
        print(e)

mojo test test_net.mojo
result

Testing Time: 3.840s

Total Discovered Tests: 5

Passed : 3 (60.00%)
Failed : 2 (40.00%)
Skipped: 0 (0.00%)

******************** Failure: '/Users/kevinthomas/Desktop/mojonet/test_net.mojo::test_net_backward()' ********************

execution failed

2024-06-21 07:17:30.675299-0400 mojo-repl-entry-point[73384:1797920] flock failed to lock list file (/var/folders/76/xcggvjjn2zq1z1z4l1hgnrn80000gn/C//com.apple.metal/16777235_275/functions.list): errno = 35
2024-06-21 07:17:30.675339-0400 mojo-repl-entry-point[73384:1797920] flock failed to lock list file (/var/folders/76/xcggvjjn2zq1z1z4l1hgnrn80000gn/C//com.apple.metal/16777235_275/functions1.list): errno = 35


error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=2, address=0x10f287170).
The process has been left at the point where it was interrupted, use "thread return -x" to return to the state before expression evaluation.

********************

******************** Failure: '/Users/kevinthomas/Desktop/mojonet/test_net.mojo::test_net_forward()' ********************

execution failed

2024-06-21 07:17:30.666774-0400 mojo-repl-entry-point[73390:1797940] flock failed to lock list file (/var/folders/76/xcggvjjn2zq1z1z4l1hgnrn80000gn/C//com.apple.metal/16777235_275/functions.list): errno = 35


error: Execution was interrupted, reason: EXC_BAD_ACCESS (code=2, address=0x11cddf0b0).
The process has been left at the point where it was interrupted, use "thread return -x" to return to the state before expression evaluation.

********************

System information

- What OS did you do install Mojo on MAC M3
- Provide version information for Mojo by pasting the output of `mojo 24.4.0 (2cb57382)`
- Provide Modular CLI version by pasting the output of `modular 0.8.0 (39a426b5)`

The text was updated successfully, but these errors were encountered:

mytechnotalent added bug Something isn't working mojo-repo Tag all issues with this label labels Jun 21, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[BUG] crash when attempting to use MAC mps when wrapping PyTorch #3092

[BUG] crash when attempting to use MAC mps when wrapping PyTorch #3092

mytechnotalent commented Jun 21, 2024 •

edited by ematejska

Loading

[BUG] crash when attempting to use MAC mps when wrapping PyTorch #3092

[BUG] crash when attempting to use MAC mps when wrapping PyTorch #3092

Comments

mytechnotalent commented Jun 21, 2024 • edited by ematejska Loading

Bug description

Steps to reproduce

System information

mytechnotalent commented Jun 21, 2024 •

edited by ematejska

Loading