-
Notifications
You must be signed in to change notification settings - Fork 557
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
cloud_storage: add remote path provider #20149
Conversation
Introduces the remote_label struct, which will be plugged into the topic properties in a later commit. This label includes a cluster UUID, meant to indicate the cluster UUID of the cluster that originally created the topic. In the future, this label struct can be extended with a user-provided tag, but for now it will serve as an effective means to avoid collisions between objects from different clusters.
With the naming scheme changing, adds some utilities for generating object paths or parts of paths (useful for listing objects) with both the old ("prefixed") and new ("labeled") naming scheme.
Adds a new remote_path_provider class that will be the centralized location for determining paths of remote objects. This commit introduces the interface for topic manifests, to be used by topic recovery, topic manifest uploads, etc.
Adds a filename to the partition manifest. This will be useful in allowing an external caller to decide the manifest path while still retaining the metadata referred in the spillover manifest path.
Similar to the topic manifest paths, this adds some utilities for partition manifest paths, introducing the old "prefixed" names and the new "labeled" names.
Adds partition manifest path generation to the remote path provider. Note, in various places, we try to get the path of a partition manifest but pass in a spillover manifest. This works today by having spillover overriding the path generation method. This commit continues this, by making the filename of the manifest overridable.
Similar to manifest paths, this adds some utilities for segment paths, introducing the old "prefixed" names and the new "labeled" names.
Adds segment path generation to the remote path provider, and some tests as examples of the new paths.
eacdf6b
to
3ac6e49
Compare
Is this an over simplification, or am I just being pedantic (or wrong!)? Like, the hash-prefixing scheme AIUI is to help spread out load in an object storage system, so its a component of naming that would be orthogonal to naming which partitions objects by cluster?
If we have I seem to recall that S3 is much better / completely better at not needing a randomized prefix, but is that true for other object storage systems? |
It's true, the hash prefixing just doesn't serve as an effective means to avoid collisions, but that doesn't mean we necessarily should get rid of them.
Good callout. Looking at others, GCS is much more concrete in their guidance: https://cloud.google.com/storage/docs/request-rate#randomness_after_a_common_prefix_is_effective_under_the_prefix. They suggest the
For ABS, it's a bit less clear about whether their partitioning guidance requires randomization at the root or if a subdirectory is sufficient...
I'm unsure whether it's critical for the prefix to actually be near the front, or whether this is mainly to ensure good distribution within a broad key range. Regardless, I'm thinking to update the names (at least for segments, and maybe for partition manifests) to use the hashes in some fashion. IMO Topic manifests OTOH are uploaded and downloaded infrequently, and I think there's a benefit in having their naming scheme be pretty flat and simple. It's becomes very easy for topic recovery to look for topic manifest candidates to restore from. |
ss::sstring labeled_topic_manifest_prefix( | ||
const remote_label& label, const model::topic_namespace& topic) { | ||
return fmt::format( | ||
"{}/{}", labeled_topic_manifest_root(topic), label.cluster_uuid()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i probably missed something in an RFC, but I'm a bit surprised that the cluster uuid is positioned so far to the right in the naming. for example, wouldn't that mean that we could not efficiently ListObjects for the topics in a given cluster, since the cluster_uuid isn't a prefix of the namespace/topic component of the key?
maybe that isn't an operation we care about being efficient, just seemed like we could have eaisly have a situation with 10's of 1000's of topics if we can have any number of clusters sharing a bucket.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this was a late-arriving thought that came in after the initial discussions about naming scheme.
What makes topic manifests special is that we don't really care about listing the topics in cluster, but rather we care about finding a topic manifest if one exists (e.g. in the case of topic recovery, we have a topic in mind that we want to restore, and so we must find its topic manifest and we don't necessarily know about the desired cluster UUID associated with it). Since listing is generally limited to using prefixes instead of regex patterns, <ns>/<tp>/
is an efficient list prefix to find some <ns>/<tp>/<cluster uuid>
.
just seemed like we could have eaisly have a situation with 10's of 1000's of topics
This is good to keep in mind. In principle the topic names should also act as potential prefixes, in which case this is only problematic if many clusters create many of the same topic. OTOH, topic names are not evenly distributed, and it's not well documented the degree to which a narrow distribution will suffer and for how long before auto-sharding will kick in for most of these storage backends.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
makes sense!
constexpr uint32_t bitmask = 0xF0000000; | ||
auto path = fmt::format("{}/{}", topic.ns(), topic.tp()); | ||
uint32_t hash = bitmask & xxhash_32(path.data(), path.size()); | ||
return fmt::format("{:08x}/meta/{}/{}", hash, topic.ns(), topic.tp()); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
presumably this is just copied from the old code?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yea this is the same as the current path generation in
redpanda/src/v/cloud_storage/topic_manifest.cc
Lines 564 to 577 in 4cdf78d
remote_manifest_path topic_manifest::get_topic_manifest_path( | |
model::ns ns, model::topic topic, manifest_format format) { | |
// The path is <prefix>/meta/<ns>/<topic>/topic_manifest.json or | |
// topic_manifest.bin depending on format | |
constexpr uint32_t bitmask = 0xF0000000; | |
auto path = fmt::format("{}/{}", ns(), topic()); | |
uint32_t hash = bitmask & xxhash_32(path.data(), path.size()); | |
// use format to decide if the path is json or bin | |
return remote_manifest_path(fmt::format( | |
"{:08x}/meta/{}/topic_manifest.{}", | |
hash, | |
path, | |
format == manifest_format::json ? "json" : "bin")); | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
LGTM!
|
||
#include <seastar/core/sstring.hh> | ||
|
||
namespace cloud_storage { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it worth leaving a comment explaining/clarifying the distinction between the "labeled" (new) vs "prefixed" (old) style in the code here and elsewhere for topic
/partition
/segment
path utils?
I can imagine that most people won't know what is legacy and what is new from these function names.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a great point. Will do this in a follow-up PR.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeh, I'd be into this as well. I was just kinda guessing about the distinction.
Adds some comments describing labeled and prefixed paths. This is review follow-up from redpanda-data#20149
Adds some comments describing labeled and prefixed paths. This is review follow-up from redpanda-data#20149
Redpanda's current hash-prefixing naming scheme for objects leaves us open to collisions when multiple clusters point at the same bucket. We intend on moving away from this scheme in favor of one that includes the cluster UUID.
To this end, this PR adds various utility methods for generating paths for topic manifests, partition manifests, and remote segments. It also introduces a new
remote_path_provider
to be the centralized place where paths are created.A later PR will plumb the path provider from partitions to the underlying archivers, purgers, anomaly detectors, etc.
Backports Required
Release Notes