-
-
Notifications
You must be signed in to change notification settings - Fork 2.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
std buffer overflows from improper usage of wtf8ToWtf16Le
#20288
Comments
I suspect there are currently quite a few places where the standard library just assumes that |
I will try to go with the Edit: Wrong unicode API. Will need to create a variant of |
That would be one way to go, but I think it can be simpler. It might be sufficient to just check if (wtf8_path.len > std.fs.max_path_bytes) since Lines 56 to 60 in 82a934b
Definitely check my assumptions on that, though. The intention is for |
After reading this, I was convinced this would be safe as well. However, this test program ends up failing after const std = @import("std");
pub fn main() !void {
const @"U+10FFFF": [4]u8 = .{ 244, 143, 191, 191 }; // first codepoint that requires 4 bytes in UTF-16LE
const input = @"U+10FFFF" ** std.fs.max_path_bytes; // just a long number of bytes
var buffer: [std.os.windows.PATH_MAX_WIDE]u16 = undefined;
var i: usize = 0;
while (true) : (i += 1) {
// std.log.info("i={}", .{i});
if (i > std.fs.max_path_bytes) return error.NameTooLong;
_ = std.unicode.utf8ToUtf16Le(&buffer, input[0..i]) catch {};
}
}
|
Yeah, I was thinking about it wrong. EDIT: ...and I was still thinking about it wrong here wrong math
EDIT: The correct math now that I finally have a grasp on things:
|
That's fine, I still have an idea of how we can skip counting most inputs beforehand because there is a maximum number of WTF-8 bytes where we can know that it will not overflow for. |
Had the same thought. Hopefully I got this right this time: EDIT: I got it wrong again. See comment below |
Here's what I'm thinking, added to // TODO: better name
// TODO: unsure if taking a slice for the wtf16 len makes sense, the only reason I think it might
// is that it makes the length unit (code units rather than bytes) obvious
pub fn wtf8ToWtf16LeBufCheck(wtf16le: []const u16, wtf8: []const u8) !bool {
if (wtf16le.len >= minWtf16CodeUnitsForWtf8Bytes(wtf8.len)) return true;
return wtf16le.len >= (try calcWtf16LeLen(wtf8));
}
pub fn minWtf16CodeUnitsForWtf8Bytes(wtf8_bytes: usize) usize {
// The largest ratio of WTF-8 bytes to WTF-16 code units is 1:1 for
// one byte WTF-8 sequences, since all other WTF-8 sequence lengths
// have a ratio of 2:1, 3:1, or 4:2
return wtf8_bytes;
}
pub fn calcWtf16LeLen(wtf8: []const u8) !usize {
// calcUtf16LeLen but for WTF-16
} so then usage would be something like: var wtf16_dir_path: [windows.PATH_MAX_WIDE]u16 = undefined;
const buf_big_enough = try std.unicode.wtf8ToWtf16LeBufCheck(&wtf16_dir_path, dir_path);
if (!buf_big_enough) {
return error.NameTooLong;
}
// We now know it is safe to do this conversion, i.e. `len` is guaranteed to be <= wtf16_dir_path.len
// Note that we do *not* know that `dir_path` is valid WTF-8, though, since we don't always need to
// call calcWtf16LeLen in wtf8ToWtf16LeBufCheck
const len = try std.unicode.wtf8ToWtf16Le(wtf16_dir_path[0..], dir_path); EDIT: I had the implementation of |
I would prefer renaming the current |
@Trevor-Strong I agree, the current design can easily lead to footguns as shown by this issue. However, I didn't include any changes related to that in my PR to avoid bikeshedding. I advise creating a new issue. |
From
wtf8ToWtf16Le
's documentation:/// Assumes there is enough space for the output.
In the following std functions we do not guarantee this when we call
wtf8ToWtf16Le
:chdir
chdirZ
Dir.symLink
We could either make the function itself return an error in some cases or create a
wtf8CountCodepoints
and make sure that this value is less thanfs.max_path_bytes
for output buffers.The text was updated successfully, but these errors were encountered: