-
Notifications
You must be signed in to change notification settings - Fork 249
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Implement support for metadata associated with read-write transactions #1914
base: main
Are you sure you want to change the base?
Conversation
CLA Assistant Lite bot All contributors have signed the CLA ✍️ ✅ |
I have read the CLA Document and I hereby sign the CLA |
recheck |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a few questions and suggestions, I think the approach makes sense to me. The main concerns are how does this affect write-throughput, but at first glance it shouldn't introduce any new hotspots.
A potential issue here is the overlap with the concept of tuple-level metadata, but we've certainly heard more about this use case than tuple-level metadata. I also do wonder if folks will eventually ask for a transactions API to read these back, but that's an easy thing to do if the time comes.
Finally: I understand these are just the Datastore API changes, but we'd have to make sure there are limits put in place so that folks don't abuse that new field.
internal/datastore/crdb/crdb.go
Outdated
// If metadata is to be attached, write that row now. | ||
if config.Metadata != nil { | ||
randomKey := uuid.NewString() | ||
expiresAt := time.Now().Add(25 * time.Hour) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why 25? The migration has the TTL set to @daily
.
The issue I see here is that the metadata TTL is not aligned with the relationship GC Window.
- if longer than the GC window, we store rows in the DB that cannot be retrieved.
- if shorter than the GC window, clients may ask for a revision and fail to obtain the metadata because it had expired earlier.
IMO we should set expiresAt
to the value of the server GC window parameter. The risk is the user changes the value to a value distinct from the default, and the table TTL configuration drifts. This is a better scenario IMO than hard coding because at least a human operator can go and modify the table directly: not ideal, but at least there is an avenue to fix it. Additionally, we could log on startup when the configured GC window and the TTL on the table have drifted, and prompt the user to intervene manually.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Updated to gcWindow
+ 1 minute
internal/datastore/crdb/crdb.go
Outdated
@@ -310,6 +312,23 @@ func (cds *crdbDatastore) ReadWriteTx( | |||
Executor: pgxcommon.NewPGXExecutor(querier), | |||
} | |||
|
|||
// If metadata is to be attached, write that row now. | |||
if config.Metadata != nil { | |||
randomKey := uuid.NewString() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I suspect we should use the native UUID type in CRDB, which is likely more space-efficient than a VARCHAR, and have the database generate it. See https://www.cockroachlabs.com/docs/stable/uuid
const ( | ||
addTransactionMetadataTableQuery = ` | ||
CREATE TABLE transaction_metadata ( | ||
key VARCHAR PRIMARY KEY, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd suggest using the UUID type, and set the default value to generate a UUID. That way we don't have to send the UUID over the wire on each insert.
@@ -99,8 +99,8 @@ func getLastRevision(tableTransaction string) sq.SelectBuilder { | |||
return sb.Select("MAX(id)").From(tableTransaction).Limit(1) | |||
} | |||
|
|||
func getRevisionRange(tableTransaction string) sq.SelectBuilder { | |||
return sb.Select("MIN(id)", "MAX(id)").From(tableTransaction) | |||
func loadRevisionRange(tableTransaction string) sq.SelectBuilder { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why did the SQL change here?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It wasn't being used so I just repurposed the name
ctx, span := tracer.Start(ctx, "createNewTransaction") | ||
defer span.End() | ||
|
||
cterr := tx.QueryRow(ctx, createTxn).Scan(&newXID, &newSnapshot) | ||
if metadata == nil { | ||
metadata = map[string]any{} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's create a global var for this so we don't allocate (I'm actually unsure if this allocates, but just in case)
|
||
const ( | ||
addTransactionMetadataTable = `CREATE TABLE transaction_metadata ( | ||
transaction_tag STRING(MAX) NOT NULL, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if the tag is a UUID, use STRING(36)
as the documentation suggests: https://cloud.google.com/spanner/docs/primary-key-default-value#universally_unique_identifier_uuid
created_at TIMESTAMP DEFAULT (CURRENT_TIMESTAMP()), | ||
metadata JSON | ||
) PRIMARY KEY (transaction_tag), | ||
ROW DELETION POLICY (OLDER_THAN(created_at, INTERVAL 2 DAY)) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same feedback as with CRDB around TTL configuration. Does interval 2 day
means "GC every second day"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It means after 2 days
func (sd *spannerDatastore) ReadWriteTx(ctx context.Context, fn datastore.TxUserFunc, opts ...options.RWTOptionsOption) (datastore.Revision, error) { | ||
config := options.NewRWTOptionsWithOptions(opts...) | ||
|
||
ctx, span := tracer.Start(ctx, "ReadWriteTx") | ||
defer span.End() | ||
|
||
transactionTag := fmt.Sprintf("spicedb-%s", uuid.NewString()) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
concatenate instead of fmt
. Also why prepending spicedb-
, seems unnecessary waste and UUID should guarantee they are unique, and we already correlate them with the table.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Because we might end up using transaction tags of other forms, so I want a prefix
@@ -252,15 +285,15 @@ func (sd *spannerDatastore) ReadWriteTx(ctx context.Context, fn datastore.TxUser | |||
} | |||
|
|||
return nil | |||
}) | |||
}, spanner.TransactionOptions{TransactionTag: transactionTag}) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any specific limitations around tagging a transaction? e.g. number of them? How long do they stay around in the DB? Do we have an understanding of how it affects write throughput?
Should we check if config.Metadata
is present outside of the RTW not to tag transactions unless requested by the customer? I assume tagging is not free, and not every Spanner user may be using transaction metadata feature.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It shouldn't have any impact, since I believe Spanner just sets it to empty by default
internal/datastore/common/changes.go
Outdated
|
||
record := ch.recordForRevision(rev) | ||
if len(record.metadata) > 0 { | ||
log.Ctx(ctx).Warn().Msg("metadata already set for revision") |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
superfluous log entry as we are going to log the error anyway
This will allow callers of APIs such as WriteRelationships and DeleteRelationships to assign metadata to the transaction that will be mirrored back out in the Watch API, to provide a means for correlating updates
cb0d605
to
86836c8
Compare
Updated |
This will allow callers of APIs such as WriteRelationships and DeleteRelationships to assign metadata to the transaction that will be mirrored back out in the Watch API, to provide a means for correlating updates
First part of #966
Each datastore has a different means of correlating between the transaction and the associated change(s) coming out of the watch API:
metadata
column was added to the transactions tablemetadata
column was added to the transactions table