Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[engine/import] Guess references to properties between dependant resources during import #16234

Open
wants to merge 4 commits into
base: master
Choose a base branch
from

Conversation

Zaid-Ajaj
Copy link
Contributor

Description

Following up and extending #16208 such that now we guess references to properties between dependant resources when running an import based on their literal data retrieved from providers. See unit test added that showcases inferred dependencies when referencing deeply nested data from maps and arrays

Checklist

  • I have run make tidy to update any new dependencies
  • I have run make lint to verify my code passes the lint check
    • I have formatted my code using gofumpt
  • I have added tests that prove my fix is effective or that my feature works
  • I have run make changelog and committed the changelog/pending/<file> documenting my change
  • Yes, there are changes in this PR that warrants bumping the Pulumi Cloud API version

@Zaid-Ajaj Zaid-Ajaj requested a review from a team as a code owner May 20, 2024 20:31
@pulumi-bot
Copy link
Contributor

pulumi-bot commented May 20, 2024

Changelog

[uncommitted] (2024-06-25)

Features

  • [engine] Guess references to (deeply nested) properties between dependant resources during import
    #16234

}

for _, pathedValue := range pathedValues {
if occurrences[pathedValue.Value] > 1 {
if pathedValue.Identity && occurrences[pathedValue.Value] > 1 {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is this just being checked for Identity values? Isn't this a problem for any value? If two resources both have an output "hello" and another resource has an input "hello" you can't decided which of the two resources to draw from.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So the reason for this because if you have for example these pathed values computed

resourceGroup.id="group123"
resourceA.resourceGroup = "group123"
resourceB.resourceGroup = "group123"

Then I wanted to maintain the path resourceGroup.id even if that value is duplicated in paths resourceA.resourceGroup and resourceB.resourceGroup.

However, I think when building the pathed values, I should only be using state.Outputs instead of state.Inputs and only replace literal inputs with paths of resource outputs. Even then, resourceA.resourceGroup and resourceB.resourceGroup will occur both on state.Inputs and state.Outputs AFAIK

Would be great if I can actually test this with the CLI but getting weird behaviour with pulumi import

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can actually do a second pass where we remove duplicate paths without taking identities into account and then merge non-duped identities and non-duped values 🤔

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I can actually do a second pass where we remove duplicate paths without taking identities into account and then merge non-duped identities and non-duped values 🤔

Implemented ☝️

Also found an edge case of "overguessing" that could result in a circular reference:

const bucket = new aws.s3.Bucket("my-bucket", {
    website: {
        indexDocument: "index.html",
    },
});

const bucketObject = new aws.s3.BucketObject("index.html", {
    bucket: bucket.id
});

Implemented a fallback such that if we encounter this error, we retry generating the program without guessing references.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has been resolved with the ancestorTypes feature where you can specify that BucketObject can guess values from outputs of Bucket but not the other way around ✅

@Zaid-Ajaj Zaid-Ajaj force-pushed the zaid/guess-deep-property-references-import branch from c5e143b to 85930f7 Compare May 24, 2024 21:00
@Zaid-Ajaj Zaid-Ajaj requested a review from Frassle May 24, 2024 21:03
pkg/importer/language.go Outdated Show resolved Hide resolved
@Zaid-Ajaj Zaid-Ajaj force-pushed the zaid/guess-deep-property-references-import branch from 2b710a5 to b89b7e1 Compare May 29, 2024 14:49
@justinvp justinvp mentioned this pull request Jun 3, 2024
9 tasks
Copy link
Member

@justinvp justinvp left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some initial comments

// const bucketObject = new aws.s3.BucketObject("index.html", {
// bucket: bucket.id
// });
// fallback to the old code path where we don't guess references
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there a test case that covers this?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I will add one!

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done ✅

return pathedLiteralValues
}

if property.IsString() {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non-string values, such as IsNumber()?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Also, what about more complex values, like arrays and objects that are deeply equal? e.g. a resource has an output that's an array of strings and another resource has the same array of strings as an input?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about non-string values, such as IsNumber()?

I didn't account for numeric values, nor booleans because I am afraid of overguessing dependencies that have nothing to do with each other. For example instanceSize: 2 on a resource A and retryCount: 2 on a resource B would infer a dependency that is not necessarily what users wanted. Felt like starting with strings would make the most sense since AFAIK all IDs are strings and references between resources use a lot of them (i.e. resourceGroup.name in azure-native, vpc.id in aws, etc.)

Also, what about more complex values, like arrays and objects that are deeply equal? e.g. a resource has an output that's an array of strings and another resource has the same array of strings as an input?

First objects: for these we do not need to check for deep equality because their elements will be references. For example, consider the data here:

resourceA {
  // outputs
  config {
      first = "A"
      second = "B"
  }
}

resourceB {
  // inputs
  config {
      first = "A"
      second = "B"
  }
}

We could infer that resourceB.config = resourceA.config but chances are these are not the same type in the SDK so this might not work. That said, this is not needed in the first place because the current logic would infer dependency by values:

resourceB {
  config {
      first = resourceA.config.first
      second = resourceB.config.second
  }
}

This version is better handled by program-gen implementations.

As for arrays, I am not sure. The current logic would index each element where it is referenced:

resourceA {
  // outputs
  data = ["first", "second"]
}

resourceB {
  // inputs
  data = ["first", "second"]
}

Inferred:

resourceB {
  // inputs
  data = [resourceA.data[0], resourceA.data[1]]
}

so it will work even though it looks a bit ugly

@@ -506,6 +506,133 @@ func TestGenerateHCL2DefinitionsDoesNotMakeSelfReferences(t *testing.T) {
assert.Equal(t, expectedCode, hcl2Text.String(), "Generated HCL2 code does not match expected code")
}

func TestGenerateHCL2DefinitionsReplacesDeeplyNestedReferencesToLiterals(t *testing.T) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we have some test cases where there are multiple resources with the same output value, and therefore shouldn't be used as an input reference anywhere?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TestGenerateHCL2DefinitionsWithAmbiguousReferencesMaintainsLiteralValue asserts just that but with IDs which are treated like outputs.

@Zaid-Ajaj Zaid-Ajaj force-pushed the zaid/guess-deep-property-references-import branch from 2a915b4 to 866654c Compare June 5, 2024 15:05
@Zaid-Ajaj Zaid-Ajaj requested a review from justinvp June 5, 2024 15:06
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

4 participants