Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

tracing regression in 0.20.0 vs 0.19.4 #5795

Open
dfellis opened this issue Jun 10, 2024 · 6 comments
Open

tracing regression in 0.20.0 vs 0.19.4 #5795

dfellis opened this issue Jun 10, 2024 · 6 comments
Labels
type: bug Something isn't working

Comments

@dfellis
Copy link

dfellis commented Jun 10, 2024

Description

Attempting to use copy_buffer_to_buffer in 0.20.0 crashes with:

thread 'main' panicked at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/resource.rs:121:17:
called `Option::unwrap()` on a `None` value

The full backtrace from one of my test runs is:

stack backtrace:
   0:     0x5597354f12d2 - std::backtrace_rs::backtrace::libunwind::trace::he4ee80166a02c846
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/libunwind.rs:105:5
   1:     0x5597354f12d2 - std::backtrace_rs::backtrace::trace_unsynchronized::h476faccf57e88641
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/../../backtrace/src/backtrace/mod.rs:66:5
   2:     0x5597354f12d2 - std::sys_common::backtrace::_print_fmt::h430c922a77e7a59c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:68:5
   3:     0x5597354f12d2 - <std::sys_common::backtrace::_print::DisplayBacktrace as core::fmt::Display>::fmt::hffecb437d922f988
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:44:22
   4:     0x55973551660c - core::fmt::rt::Argument::fmt::hf3df69369399bfa9
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/fmt/rt.rs:142:9
   5:     0x55973551660c - core::fmt::write::hd9a8d7d029f9ea1a
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/fmt/mod.rs:1153:17
   6:     0x5597354ef14f - std::io::Write::write_fmt::h0e1226b2b8d973fe
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/io/mod.rs:1843:15
   7:     0x5597354f10a4 - std::sys_common::backtrace::_print::hd2df4a083f6e69b8
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:47:5
   8:     0x5597354f10a4 - std::sys_common::backtrace::print::he907f6ad7eee41cb
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:34:9
   9:     0x5597354f255b - std::panicking::default_hook::{{closure}}::h3926193b61c9ca9b
  10:     0x5597354f22b3 - std::panicking::default_hook::h25ba2457dea68e65
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:292:9
  11:     0x5597354f29fd - std::panicking::rust_panic_with_hook::h0ad14d90dcf5224f
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:779:13
  12:     0x5597354f2899 - std::panicking::begin_panic_handler::{{closure}}::h4a1838a06f542647
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:649:13
  13:     0x5597354f17a6 - std::sys_common::backtrace::__rust_end_short_backtrace::h77cc4dc3567ca904
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:171:18
  14:     0x5597354f2604 - rust_begin_unwind
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:645:5
  15:     0x5597348934e5 - core::panicking::panic_fmt::h940d4fd01a4b4fd1
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:72:14
  16:     0x5597348935a3 - core::panicking::panic::h8ddd58dc57c2dc00
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/panicking.rs:145:5
  17:     0x559734893486 - core::option::unwrap_failed::hf59153bb1e2fc334
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/option.rs:1985:5
  18:     0x559734e379a0 - core::option::Option<T>::unwrap::hdeb99919510551b3
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/option.rs:933:21
  19:     0x559734e379a0 - wgpu_core::resource::ResourceInfo<T>::id::he0c6517bd8e3f91d
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/resource.rs:121:9
  20:     0x559734e38def - <wgpu_core::resource::Buffer<A> as core::ops::drop::Drop>::drop::h8fe7e4be1a0f6653
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/resource.rs:404:52
  21:     0x559734de4da7 - core::ptr::drop_in_place<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>::ha1c38d8abfc5c79c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  22:     0x559734eaf2ff - alloc::sync::Arc<T,A>::drop_slow::h52b7243041689c9a
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/sync.rs:1804:18
  23:     0x559734eb3232 - <alloc::sync::Arc<T,A> as core::ops::drop::Drop>::drop::ha775173f5482ce52
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/alloc/src/sync.rs:2459:13
  24:     0x559734dcd4bb - core::ptr::drop_in_place<alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>>::h50a1ca1948d557f6
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  25:     0x559734dd943f - core::ptr::drop_in_place<(wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>)>::hcb77ccecad10dfdc
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  26:     0x559734c80e72 - core::ptr::mut_ptr::<impl *mut T>::drop_in_place::hf563684f0ac0247e
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mut_ptr.rs:1473:18
  27:     0x559734c80e72 - hashbrown::raw::Bucket<T>::drop::h82efac29a399c8c8
                               at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:590:23
  28:     0x559734c78498 - hashbrown::raw::RawTableInner::drop_elements::hc20a52aa5ac43ff3
                               at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:2379:17
  29:     0x559734c79b80 - hashbrown::raw::RawTableInner::drop_inner_table::h322098f924a4e6c2
                               at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:2434:17
  30:     0x559734c747fa - <hashbrown::raw::RawTable<T,A> as core::ops::drop::Drop>::drop::h8c81f5c568abf00a
                               at /rust/deps/hashbrown-0.14.3/src/raw/mod.rs:3678:13
  31:     0x559734ddca5b - core::ptr::drop_in_place<hashbrown::raw::RawTable<(wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>)>>::h848d236953d9150c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  32:     0x559734ddeddb - core::ptr::drop_in_place<hashbrown::map::HashMap<wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::h268807d3d564515d
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  33:     0x559734ddf22b - core::ptr::drop_in_place<std::collections::hash::map::HashMap<wgpu_core::track::TrackerIndex,alloc::sync::Arc<wgpu_core::resource::Buffer<wgpu_hal::vulkan::Api>>,core::hash::BuildHasherDefault<rustc_hash::FxHasher>>>::h109c75fb36db2a51
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  34:     0x559734de8fe7 - core::ptr::drop_in_place<wgpu_core::device::life::ResourceMaps<wgpu_hal::vulkan::Api>>::h1e5585fb807c2963
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ptr/mod.rs:515:1
  35:     0x559734d671bd - wgpu_core::device::life::LifetimeTracker<A>::triage_submissions::h9b06f4999f3535f0
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/life.rs:413:9
  36:     0x559734e285a8 - wgpu_core::device::resource::Device<A>::maintain::ha6ad7b20c3de2f07
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/resource.rs:434:13
  37:     0x559734b8ddd8 - wgpu_core::device::queue::<impl wgpu_core::global::Global>::queue_submit::h95b283a3c900a2eb
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-core-0.20.0/src/device/queue.rs:1555:39
  38:     0x559734b4c13e - <wgpu::backend::wgpu_core::ContextWgpuCore as wgpu::context::Context>::queue_submit::he97dd3b53d285061
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.0/src/backend/wgpu_core.rs:2260:27
  39:     0x559734b57423 - <T as wgpu::context::DynContext>::queue_submit::hb2174c2d1030c8b5
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.0/src/context.rs:3025:13
  40:     0x5597348a09e5 - wgpu::Queue::submit::hfab63398e5d30aab
                               at /home/damocles/.cargo/registry/src/index.crates.io-6f17d22bba15001f/wgpu-0.20.0/src/lib.rs:4981:27
  41:     0x5597348a4980 - alan_generated_bin::read_buffer::hbedf71215769391d
                               at /home/damocles/.config/alan/alan_generated_bin/src/main.rs:266:5
  42:     0x5597348a50b8 - alan_generated_bin::main::h1e3a29ab7fcab87a
                               at /home/damocles/.config/alan/alan_generated_bin/src/main.rs:309:20
  43:     0x559734894f6b - core::ops::function::FnOnce::call_once::h5fce3699794672b3
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:250:5
  44:     0x55973489c71e - std::sys_common::backtrace::__rust_begin_short_backtrace::h6179380494bff8d2
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/sys_common/backtrace.rs:155:18
  45:     0x5597348a1441 - std::rt::lang_start::{{closure}}::hf0241278dd9be494
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:166:18
  46:     0x5597354eb253 - core::ops::function::impls::<impl core::ops::function::FnOnce<A> for &F>::call_once::h52f5991f9ab8b369
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/core/src/ops/function.rs:284:13
  47:     0x5597354eb253 - std::panicking::try::do_call::h0ac4bee9a397a1bf
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  48:     0x5597354eb253 - std::panicking::try::hc005decaf198d0ed
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  49:     0x5597354eb253 - std::panic::catch_unwind::hb0f967d870b2a382
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  50:     0x5597354eb253 - std::rt::lang_start_internal::{{closure}}::hd140b84b0efe534b
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:48
  51:     0x5597354eb253 - std::panicking::try::do_call::h1ddfaf1d0d576c38
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:552:40
  52:     0x5597354eb253 - std::panicking::try::hdd4bdf855547659f
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panicking.rs:516:19
  53:     0x5597354eb253 - std::panic::catch_unwind::h276ba91c7706110c
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/panic.rs:146:14
  54:     0x5597354eb253 - std::rt::lang_start_internal::h103c42a9c4e95084
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:148:20
  55:     0x5597348a141a - std::rt::lang_start::hce91f7cfea2f3ec4
                               at /rustc/9b00956e56009bab2aa15d7bff10916599e3d6d6/library/std/src/rt.rs:165:17
  56:     0x5597348a536e - main
  57:     0x7f6bdeeea088 - __libc_start_call_main
  58:     0x7f6bdeeea14b - __libc_start_main_impl
  59:     0x559734893c85 - _start
  60:                0x0 - <unknown>

Somehow, something internal to wgpu doesn't have an ID. I temporarily added #derive(Debug) to my own structures and debug logged the buffers I'm passing to copy_buffer_to_buffer and they all had IDs, so I'm not sure what exactly is going on, but looking a bit higher up the stack, it looks like it's related to the automatic GPU resource cleanup logic in 0.20.0 though I don't understand why it would be triggered.

Repro steps
I put a trimmed version of the code in a gist you just need to copy the main.rs file to a src/main.rs in a normal Rust project to test it.

Expected vs observed behavior
This code, (with minor modifications to remove the compilation_options field from the ComputePipelineDescriptor, compiles and runs successfully on 0.19.4, but crashes on 0.20.0

Extra materials

I include the trace.zip it generated.

Platform
I've tested this on Fedora/x86-64 and Debian/RISC-V with the same results, only wgpu version 0.20.0 is affected.

@dfellis
Copy link
Author

dfellis commented Jun 10, 2024

I edited my local cargo cache to insert a debug log on the buffer that's being set to be freed that is crashing things, which you can see below (with some hand formatting for better legibility:

Buffer {
  raw: <snatchable>,
  device: Device {
    adapter: "<Adapter>",
    limits: Limits {
       max_texture_dimension_1d: 16384,
       max_texture_dimension_2d: 16384,
       max_texture_dimension_3d: 2048,
       max_texture_array_layers: 2048,
       max_bind_groups: 8,
       max_bindings_per_bind_group: 1000,
       max_dynamic_uniform_buffers_per_pipeline_layout: 16,
       max_dynamic_storage_buffers_per_pipeline_layout: 8,
       max_sampled_textures_per_shader_stage: 8388606,
       max_samplers_per_shader_stage: 8388606,
       max_storage_buffers_per_shader_stage: 8388606,
       max_storage_textures_per_shader_stage: 8388606,
       max_uniform_buffers_per_shader_stage: 8388606,
       max_uniform_buffer_binding_size: 2147483648,
       max_storage_buffer_binding_size: 2147483648,
       max_vertex_buffers: 16,
       max_buffer_size: 2147483647,
       max_vertex_attributes: 32,
       max_vertex_buffer_array_stride: 2048,
       min_uniform_buffer_offset_alignment: 32,
       min_storage_buffer_offset_alignment: 32,
       max_inter_stage_shader_components: 128,
       max_color_attachments: 8,
       max_color_attachment_bytes_per_sample: 32,
       max_compute_workgroup_storage_size: 65536,
       max_compute_invocations_per_workgroup: 1024,
       max_compute_workgroup_size_x: 1024,
       max_compute_workgroup_size_y: 1024,
       max_compute_workgroup_size_z: 1024,
       max_compute_workgroups_per_dimension: 65535,
       min_subgroup_size: 64,
       max_subgroup_size: 64,
       max_push_constant_size: 256,
       max_non_sampler_bindings: 4294967295
    },
    features: Features(DEPTH_CLIP_CONTROL | DEPTH32FLOAT_STENCIL8 | TEXTURE_COMPRESSION_BC | TIMESTAMP_QUERY | INDIRECT_FIRST_INSTANCE | SHADER_F16 | RG11B10UFLOAT_RENDERABLE | BGRA8UNORM_STORAGE | FLOAT32_FILTERABLE | TEXTURE_FORMAT_16BIT_NORM | TEXTURE_ADAPTER_SPECIFIC_FORMAT_FEATURES | PIPELINE_STATISTICS_QUERY | TIMESTAMP_QUERY_INSIDE_ENCODERS | TIMESTAMP_QUERY_INSIDE_PASSES | MAPPABLE_PRIMARY_BUFFERS | TEXTURE_BINDING_ARRAY | BUFFER_BINDING_ARRAY | STORAGE_RESOURCE_BINDING_ARRAY | SAMPLED_TEXTURE_AND_STORAGE_BUFFER_ARRAY_NON_UNIFORM_INDEXING | UNIFORM_BUFFER_AND_STORAGE_TEXTURE_ARRAY_NON_UNIFORM_INDEXING | PARTIALLY_BOUND_BINDING_ARRAY | MULTI_DRAW_INDIRECT | MULTI_DRAW_INDIRECT_COUNT | PUSH_CONSTANTS | ADDRESS_MODE_CLAMP_TO_ZERO | ADDRESS_MODE_CLAMP_TO_BORDER | POLYGON_MODE_LINE | POLYGON_MODE_POINT | CONSERVATIVE_RASTERIZATION | VERTEX_WRITABLE_STORAGE | CLEAR_TEXTURE | SPIRV_SHADER_PASSTHROUGH | MULTIVIEW | SHADER_UNUSED_VERTEX_OUTPUT | TEXTURE_FORMAT_NV12 | SHADER_F64 | SHADER_I16 | SHADER_PRIMITIVE_INDEX | DUAL_SOURCE_BLENDING | SHADER_INT64 | SUBGROUP | SUBGROUP_VERTEX | SUBGROUP_BARRIER),
    downlevel: DownlevelCapabilities {
      flags: DownlevelFlags(COMPUTE_SHADERS | FRAGMENT_WRITABLE_STORAGE | INDIRECT_EXECUTION | BASE_VERTEX | READ_ONLY_DEPTH_STENCIL | NON_POWER_OF_TWO_MIPMAPPED_TEXTURES | CUBE_ARRAY_TEXTURES | COMPARISON_SAMPLERS | INDEPENDENT_BLEND | VERTEX_STORAGE | ANISOTROPIC_FILTERING | FRAGMENT_STORAGE | MULTISAMPLED_SHADING | DEPTH_TEXTURE_AND_BUFFER_COPIES | WEBGPU_TEXTURE_FORMAT_SUPPORT | BUFFER_BINDINGS_NOT_16_BYTE_ALIGNED | UNRESTRICTED_INDEX_BUFFER | FULL_DRAW_INDEX_UINT32 | DEPTH_BIAS_CLAMP | VIEW_FORMATS | UNRESTRICTED_EXTERNAL_TEXTURE_COPIES | SURFACE_VIEW_FORMATS | NONBLOCKING_QUERY_RESOLVE | VERTEX_AND_INSTANCE_INDEX_RESPECTS_RESPECTIVE_FIRST_VALUE_IN_INDIRECT_DRAW),
      limits: DownlevelLimits,
      shader_model: Sm5
    }
  },
  usage: BufferUsages(MAP_WRITE | COPY_SRC),
  size: 16,
  initialization_status: RwLock { data: InitTracker { uninitialized_ranges: [] } },
  sync_mapped_writes: Mutex { data: None },
  info: ResourceInfo {
    id: None,
    tracker_index: TrackerIndex(1),
    tracker_indices: Some(SharedTrackerIndexAllocator { inner: Mutex { data:  } }),
    submission_index: 0,
    label: "(wgpu internal) initializing unmappable buffer"
  },
  map_state: Mutex { data: Idle },
  bind_groups: Mutex { data: [] }
}

I don't create a buffer with MAP_WRITE | COPY_SRC flags set, and the label "(wgpu internal)..." indicates this is probably something internal to the copy_buffer_to_buffer function. I still don't know how it has no ID, though.

@dfellis
Copy link
Author

dfellis commented Jun 10, 2024

So only one place creates a label with that name, the device_create_buffer in wgpu_core/src/device/global.rs

Some debug logging on the args there reveals:

desc: BufferDescriptor { label: None, size: 16, usage: BufferUsages(COPY_SRC | COPY_DST | STORAGE), mapped_at_creation: true }

The described buffer to create is supposedly the buffer I'm copying from, but by this point in the trace, that buffer should already exist.

But if I slap a seemingly useless MAP_WRITE onto that buffer, avoiding whatever this temporary buffer is, the code compiles and runs on 0.20.0.

So I think that's the end of my bug report for now, as I don't understand why this temporary buffer is needed when copying from this buffer, and I don't know why it's not getting a proper ID during creation, but I do have a workaround for the time being.

@Wumpf Wumpf added the type: bug Something isn't working label Jun 10, 2024
@ErichDonGubler
Copy link
Member

@dfellis: Just the context I'm aware of: We're in the middle of transitioning backend resources to being tracked only by Arc, rather than ID. There are, unfortunately, some places where we are still tracking by ID. When code that only keeps track of Arcs attempts to use APIs that use IDs, then the code has no choice but to panic, since we're definitely doing something we Shouldn't Do™.

I believe that the solution here is to progress in our migration of resource tracking code that uses Arcs instead of IDs.

CC @teoxoy, @jimblandy.

@dfellis
Copy link
Author

dfellis commented Jun 10, 2024

@ErichDonGubler understood. Do you know what the timeline is on that conversion?

I've realized that my hack to work around this won't cut it because it fails for the OpenGL backend since MAP_WRITE is only allowed to be paired with COPY_SRC. That it's even working at all on the Vulkan backend is probably itself a bug?

And with that, I probably have to hold off on upgrading until copy_buffer_to_buffer works without this failure, or I find a cross-backend workaround and leave a big TODO to try and move back to the normal API.

@ErichDonGubler
Copy link
Member

@dfellis: We don't currently have one, but if this conversion is blocking or regressing user code, there's a good chance we can justify prioritizing it!

I'll let others comment on further context here, since I don't have it. 😅

@dfellis
Copy link
Author

dfellis commented Jun 10, 2024

@ErichDonGubler got it! But in the meantime, I have finally realized what's actually causing the crash in copy_buffer_to_buffer and it's the tracing itself.

I turned on tracing when I couldn't get my code working on the RISC-V single board computer I bought to specifically try and catch bugs in my code from platform assumptions, and then started getting errors. (Hooray, purchase justified ;) )

In the meantime I figured out that the issue was the Vulkan driver on this SBC doesn't implement everything needed for wgpu so I added logic to scan all of the adapters and pick the first one that has true for is_webgpu_compliant, but I did that on a new branch off of my main, which had wgpu on 0.19.4 without tracing on, while the branch I was debugging on is 0.20.0 with tracing turned on.

With the apparent fix for 0.20.0 being to slap MAP_WRITE onto a buffer that it shouldn't be on, I started prepping that for actual merging by turning off tracing and tests continued to pass on my x86-64 machines, tried to run it on the RISC-V machine and I got the validation error that I'm configuring the buffer incorrectly.

Okay, I agree, so let's try and figure out how to replicate whatever copy_buffer_to_buffer is doing internally with a temporary MAP_WRITE buffer, so I created some extra temporary buffers and tried to insert them into the command queue, getting more errors that I'm doing things incorrectly when I was trying to write into the MAP_WRITE buffer so I could then use it to write out to another buffer, and then I just reverted all of the changes in that file and re-ran the failing test so I could get the stacktrace on my machine, and it just worked.

Tested it on the RISC-V SBC and it also worked there: the difference is just removing features = ["trace"] in the Cargo.toml file.

So now I would say my real bug report is that the trace feature is broken by this migration to Arc, because it looks like the trace output requires IDs? (See snippet from the trace below) And this breakage in trace then produces a super misleading rabbit hole to spend a couple of days on.

Submit(2, [
    CopyBufferToBuffer(
        src: Id(0, 1, Vulkan),
        src_offset: 0,
        dst: Id(1, 1, Vulkan),
        dst_offset: 0,
        size: 16,
    ),
]),

@dfellis dfellis changed the title copy_buffer_to_buffer regression in 0.20.0 vs 0.19.4 tracing regression in 0.20.0 vs 0.19.4 Jun 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
type: bug Something isn't working
Projects
None yet
Development

No branches or pull requests

3 participants