Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Define input, ouput, intermediate data nodes #1363

Open
1 of 6 tasks
FlorianJacta opened this issue Jun 5, 2024 · 12 comments
Open
1 of 6 tasks

Define input, ouput, intermediate data nodes #1363

FlorianJacta opened this issue Jun 5, 2024 · 12 comments
Labels
Core Related to Taipy Core 💬 Discussion Requires some discussion and decision ✨New feature 🟧 Priority: High Must be addressed as soon 🔒 Staff only Can only be assigned to the Taipy R&D team
Milestone

Comments

@FlorianJacta
Copy link
Member

FlorianJacta commented Jun 5, 2024

Description

The goal of this issue is to discuss what does input, ouput, intermediate data nodes mean.

Solution Proposed

To my mind, the concept of input, output and intermediate data nodes are relative to the DAG.

In my opinion, <data node>.is_input doesn't have a meaning for example by itself.

This concept should be attached to the objects representing a DAG:

  • Config (even if the config does not represent a DAG directly, it can represent mutiple scenario configs that creates a DAG)
  • Scenario / Scenario Config
  • Sequence

In other terms:

  • Config.inputs should return the list of inputs corresponding to the DAG created by all the Scenario Configs

  • Config.outputs should return the list of outputs corresponding to the DAG created by all the Scenario Configs

  • <Scenario>.inputs should return the list of inputs corresponding to the DAG created by THE Scenario (Config)

  • <Scenario>.outputs should return the list of outputs corresponding to the DAG created by THE Scenario (Config)

  • <Sequence>.inputs should return the list of inputs corresponding to the DAG created by THE Sequence

  • <Sequence>.outputs should return the list of outputs corresponding to the DAG created by THE Sequence

I think the inputs/outputs of interest for the Data Node Selector are the ones relative to the whole Config.

Impact of Solution

No response

Additional Context

No response

Acceptance Criteria

  • Ensure new code is unit tested, and check code coverage is at least 90%.
  • Create related issue in taipy-doc for documentation and Release Notes.
  • Check if a new demo could be provided based on this, or if legacy demos could be benefit from it.
  • Ensure any change is well documented.

Code of Conduct

  • I have checked the existing issues.
  • I am willing to work on this issue (optional)
@FlorianJacta FlorianJacta added Core Related to Taipy Core 🟧 Priority: High Must be addressed as soon ✨New feature 💬 Discussion Requires some discussion and decision labels Jun 5, 2024
@jrobinAV jrobinAV added the 🔒 Staff only Can only be assigned to the Taipy R&D team label Jun 5, 2024
@trgiangdo trgiangdo added this to the Community 3.2 milestone Jun 18, 2024
@trgiangdo
Copy link
Member

trgiangdo commented Jun 21, 2024

For the global Config:

  • Config.inputs includes data nodes that are input of a task, but are not output of any tasks
  • Config.outputs includes data nodes that are output of a task, but are not input of any tasks

For <Scenario>.inputs, <Scenario>.outputs, <Sequence>.inputs, and <Sequence>.outputs, we already have similar APIs in the Submittable class. We can expose those if needed.

@FlorianJacta
Copy link
Member Author

This seems right to me!

@jrobinAV
Copy link
Member

jrobinAV commented Jun 21, 2024

The APIs you mentioned @trgiangdo are contextual, meaning they are Config or Submittable APIs. So, we can interpret the API as follows:

  • Config.inputs: From the Config standpoint, here are all the input data node configs.
  • Config.outputs: From the Config standpoint, here are all the output data node configs.
  • my_scenario.inputs: From the my_scenario standpoint, here are all the input data nodes.
  • my_scenario.my_sequence.outputs: From the my_sequence standpoint, here are all the output data nodes.
  • etc.

The question is slightly different, though. It concerns the default context when there is no explicit one. How can we answer the question, "Is this data node an input?" independently from any context? @FlorianJacta proposes using Config as the default context, but I am not sure it is intuitive enough. Moreover, the question has been raised in the data node selector filter, which is exposed to the end user. There is a high probability the end user does not know anything about the config DAGs.

Let's take a complex example.

from datetime import datetime
from taipy import Config, Core, Frequency, Scope, create_scenario


def identity(value):
    return value


d1 = Config.configure_data_node("d1", scope=Scope.GLOBAL)
d2 = Config.configure_data_node("d2", scope=Scope.CYCLE)
d3 = Config.configure_data_node("d3", scope=Scope.SCENARIO)
d4 = Config.configure_data_node("d4", scope=Scope.SCENARIO)

t1 = Config.configure_task("t1", function=identity, input=[d1], output=[d2])
t2 = Config.configure_task("t2", function=identity, input=[d2], output=[d3])

t3 = Config.configure_task("t3", function=identity, input=[d1, d2, d3], output=[d4])

s1 = Config.configure_scenario("s1", task_configs=[t1, t2],
                               sequences={"seq1": [t1], "seq2": [t2]},
                               frequency=Frequency.DAILY)
s2 = Config.configure_scenario("s2", task_configs=[t3], frequency=Frequency.DAILY)

Core().run()
scenario_1 = create_scenario(s1, datetime(2021, 1, 1))
scenario_2 = create_scenario(s1, datetime(2021, 1, 2))
scenario_3 = create_scenario(s2, datetime(2021, 1, 1))
scenario_4 = create_scenario(s2, datetime(2021, 1, 2))

The piece of code instantiates the following data nodes:
One global scoped dn: d1
Two cycle scoped dns: scenario_1.d2, scenario_2.d2
Six scenario scoped dns: scenario_1.d3, scenario_2.d3, scenario_3.d3, scenario_4.d3, scenario_3.d4, scenario_4.d4

What are the inputs, the outputs, and the intermediate data nodes? As an end-user, I really don't know what I am expecting as an answer.

@FlorianJacta
Copy link
Member Author

I need clarification on what is confusing about this. Why is the definition above not the expected definition?

@jrobinAV
Copy link
Member

As an end user, listing all input data nodes is not self-explanatory. I need to well understand the whole config with all the scenario configs, all the sequences, etc. to understand what I am going to get.

Let's imagine I have a role that only allows me to view scenarios from the second scenarios config s2. So, I am expecting to get [d1, scenario_1.d2, scenario_2.d2, scenario_3.d3, scenario_4.d3] as a result when asking for inputs. Your proposal will only return [d1].

@trgiangdo
Copy link
Member

Do we have a role system that can explicitly set the access role of a user to some specific scenarios?
I did not know that.

Anyway, from the example that you declare:
Config.inputs = [d1]
Config.outputs = [d4]
When we call Config..., the list will be a list of data node configuration.

For the scenario entities:
scenario_1.inputs = [scenario_1.d1]
scenario_1.outputs = [scenario_1.d3]
scenario_1.seq_1.inputs = [scenario_1.d1]
scenario_1.seq_1.outputs = [scenario_1.d2]
scenario_1.seq_2.inputs = [scenario_1.d2]
scenario_1.seq_2.outputs = [scenario_1.d3]
scenario_2 is the same as scenario_1

scenario_3.inputs = [scenario_3.d1, scenario_3.d2, scenario_3.d3]
scenario_3.outputs = [scenario_3.d4]
scenario_4 is the same as scenario_3

The scope of the data node doesn't affect the outcome of these APIs I think

@jrobinAV
Copy link
Member

@trgiangdo I was not specifically talking about Taipy enterprise roles. My example was confusing. Let me rephrase the sentence.
'Let's imagine I have a user interface on which I only view scenarios from the second scenarios config s2.'

What would be the result of tp.get_inputs(), without any explicit context?
Or in other words, what would be the result of scenario_1.d2.is_input()?
In such use case, I am expecting as an answer :
tp.get_inputs() == [d1, scenario_1.d2, scenario_2.d2, scenario_3.d3, scenario_4.d3]
scenario_1.d2.is_input() == True
Both will be false with Florian proposal.

@FlorianJacta
Copy link
Member Author

FlorianJacta commented Jun 21, 2024

In my opinion, .is_input doesn't have a meaning for example by itself.

This is what I wrote in the issue.

tp.get_inputs() doesn't mean anything to me

A Data Node is input/output depending on the context.

@trgiangdo
Copy link
Member

trgiangdo commented Jun 21, 2024

I don't think tp.get_inputs() or <DataNode>.is_input() are possible at all.

Me and Florian agree on the 6 APIs: Config.inputs, Config.outputs, <Scenario>.inputs, <Scenario>.outputs, <Sequence>.inputs, and <Sequence>.outputs, I think.

For the scenario_1.d2.is_input() == True, it is correct right? Since we are looking at the data node at scenario context. But I don't see how we can implement it, because it need to know which scenario is calling to it as well, so .is_input() is not possible and make no sense.

@jrobinAV
Copy link
Member

Are you saying I should better read you description? 🤣
If so, I believe you are right...

I misunderstood your proposal. Sorry.

@trgiangdo
Copy link
Member

So do we agree on the requirements now?

@jrobinAV
Copy link
Member

After a better reading, I now understand the proposal. I am okay with the concepts exposed in the Taipy core package. But I believe it does not answer the issue, in particular on the sentence from the description that is, in the end, the root motivation of the issue:

"I think the inputs/outputs of interest for the Data Node Selector are relative to the whole Config."

I strongly believe, we don't want to expose the config inputs and outputs in the data node selector.
The config is a developer concept, not an end-user concept. The end-user will not easily understand the input and output data nodes.
What is needed in the data node selector is another concept that sometimes (mostly in demos) overlaps with the developer input-output data node concept. My understanding is that the end-user wants to access two kinds of data nodes quickly:

  • The ones to eventually edit so he/she can recompute the scenario, and propagate the changes to other data nodes. these data nodes don't match the config inputs, even if they have an overlap with the developer's inputs.
  • The ones to visualize and analyze to understand or validate a solution. These data nodes don't match the config outputs, even if they have an overlap with the developer's outputs.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Core Related to Taipy Core 💬 Discussion Requires some discussion and decision ✨New feature 🟧 Priority: High Must be addressed as soon 🔒 Staff only Can only be assigned to the Taipy R&D team
Projects
None yet
Development

No branches or pull requests

3 participants