Define input, ouput, intermediate data nodes #1363

FlorianJacta · 2024-06-05T12:34:09Z

Description

The goal of this issue is to discuss what does input, ouput, intermediate data nodes mean.

Solution Proposed

To my mind, the concept of input, output and intermediate data nodes are relative to the DAG.

In my opinion, <data node>.is_input doesn't have a meaning for example by itself.

This concept should be attached to the objects representing a DAG:

Config (even if the config does not represent a DAG directly, it can represent mutiple scenario configs that creates a DAG)
Scenario / Scenario Config
Sequence

In other terms:

Config.inputs should return the list of inputs corresponding to the DAG created by all the Scenario Configs
Config.outputs should return the list of outputs corresponding to the DAG created by all the Scenario Configs
<Scenario>.inputs should return the list of inputs corresponding to the DAG created by THE Scenario (Config)
<Scenario>.outputs should return the list of outputs corresponding to the DAG created by THE Scenario (Config)
<Sequence>.inputs should return the list of inputs corresponding to the DAG created by THE Sequence
<Sequence>.outputs should return the list of outputs corresponding to the DAG created by THE Sequence

I think the inputs/outputs of interest for the Data Node Selector are the ones relative to the whole Config.

Impact of Solution

No response

Additional Context

No response

Acceptance Criteria

Ensure new code is unit tested, and check code coverage is at least 90%.
Create related issue in taipy-doc for documentation and Release Notes.
Check if a new demo could be provided based on this, or if legacy demos could be benefit from it.
Ensure any change is well documented.

Code of Conduct

I have checked the existing issues.
I am willing to work on this issue (optional)

The text was updated successfully, but these errors were encountered:

trgiangdo · 2024-06-21T02:44:27Z

For the global Config:

Config.inputs includes data nodes that are input of a task, but are not output of any tasks
Config.outputs includes data nodes that are output of a task, but are not input of any tasks

For <Scenario>.inputs, <Scenario>.outputs, <Sequence>.inputs, and <Sequence>.outputs, we already have similar APIs in the Submittable class. We can expose those if needed.

FlorianJacta · 2024-06-21T07:23:01Z

This seems right to me!

jrobinAV · 2024-06-21T09:25:00Z

The APIs you mentioned @trgiangdo are contextual, meaning they are Config or Submittable APIs. So, we can interpret the API as follows:

Config.inputs: From the Config standpoint, here are all the input data node configs.
Config.outputs: From the Config standpoint, here are all the output data node configs.
my_scenario.inputs: From the my_scenario standpoint, here are all the input data nodes.
my_scenario.my_sequence.outputs: From the my_sequence standpoint, here are all the output data nodes.
etc.

The question is slightly different, though. It concerns the default context when there is no explicit one. How can we answer the question, "Is this data node an input?" independently from any context? @FlorianJacta proposes using Config as the default context, but I am not sure it is intuitive enough. Moreover, the question has been raised in the data node selector filter, which is exposed to the end user. There is a high probability the end user does not know anything about the config DAGs.

Let's take a complex example.

from datetime import datetime
from taipy import Config, Core, Frequency, Scope, create_scenario


def identity(value):
    return value


d1 = Config.configure_data_node("d1", scope=Scope.GLOBAL)
d2 = Config.configure_data_node("d2", scope=Scope.CYCLE)
d3 = Config.configure_data_node("d3", scope=Scope.SCENARIO)
d4 = Config.configure_data_node("d4", scope=Scope.SCENARIO)

t1 = Config.configure_task("t1", function=identity, input=[d1], output=[d2])
t2 = Config.configure_task("t2", function=identity, input=[d2], output=[d3])

t3 = Config.configure_task("t3", function=identity, input=[d1, d2, d3], output=[d4])

s1 = Config.configure_scenario("s1", task_configs=[t1, t2],
                               sequences={"seq1": [t1], "seq2": [t2]},
                               frequency=Frequency.DAILY)
s2 = Config.configure_scenario("s2", task_configs=[t3], frequency=Frequency.DAILY)

Core().run()
scenario_1 = create_scenario(s1, datetime(2021, 1, 1))
scenario_2 = create_scenario(s1, datetime(2021, 1, 2))
scenario_3 = create_scenario(s2, datetime(2021, 1, 1))
scenario_4 = create_scenario(s2, datetime(2021, 1, 2))

The piece of code instantiates the following data nodes:
One global scoped dn: d1
Two cycle scoped dns: scenario_1.d2, scenario_2.d2
Six scenario scoped dns: scenario_1.d3, scenario_2.d3, scenario_3.d3, scenario_4.d3, scenario_3.d4, scenario_4.d4

What are the inputs, the outputs, and the intermediate data nodes? As an end-user, I really don't know what I am expecting as an answer.

FlorianJacta · 2024-06-21T09:56:54Z

I need clarification on what is confusing about this. Why is the definition above not the expected definition?

jrobinAV · 2024-06-21T10:30:32Z

As an end user, listing all input data nodes is not self-explanatory. I need to well understand the whole config with all the scenario configs, all the sequences, etc. to understand what I am going to get.

Let's imagine I have a role that only allows me to view scenarios from the second scenarios config s2. So, I am expecting to get [d1, scenario_1.d2, scenario_2.d2, scenario_3.d3, scenario_4.d3] as a result when asking for inputs. Your proposal will only return [d1].

trgiangdo · 2024-06-21T10:51:29Z

Do we have a role system that can explicitly set the access role of a user to some specific scenarios?
I did not know that.

Anyway, from the example that you declare:
Config.inputs = [d1]
Config.outputs = [d4]
When we call Config..., the list will be a list of data node configuration.

For the scenario entities:
scenario_1.inputs = [scenario_1.d1]
scenario_1.outputs = [scenario_1.d3]
scenario_1.seq_1.inputs = [scenario_1.d1]
scenario_1.seq_1.outputs = [scenario_1.d2]
scenario_1.seq_2.inputs = [scenario_1.d2]
scenario_1.seq_2.outputs = [scenario_1.d3]
scenario_2 is the same as scenario_1

scenario_3.inputs = [scenario_3.d1, scenario_3.d2, scenario_3.d3]
scenario_3.outputs = [scenario_3.d4]
scenario_4 is the same as scenario_3

The scope of the data node doesn't affect the outcome of these APIs I think

jrobinAV · 2024-06-21T11:51:22Z

@trgiangdo I was not specifically talking about Taipy enterprise roles. My example was confusing. Let me rephrase the sentence.
'Let's imagine I have a user interface on which I only view scenarios from the second scenarios config s2.'

What would be the result of tp.get_inputs(), without any explicit context?
Or in other words, what would be the result of scenario_1.d2.is_input()?
In such use case, I am expecting as an answer :
tp.get_inputs() == [d1, scenario_1.d2, scenario_2.d2, scenario_3.d3, scenario_4.d3]
scenario_1.d2.is_input() == True
Both will be false with Florian proposal.

FlorianJacta · 2024-06-21T11:55:04Z

In my opinion, .is_input doesn't have a meaning for example by itself.

This is what I wrote in the issue.

tp.get_inputs() doesn't mean anything to me

A Data Node is input/output depending on the context.

trgiangdo · 2024-06-21T11:57:56Z

I don't think tp.get_inputs() or <DataNode>.is_input() are possible at all.

Me and Florian agree on the 6 APIs: Config.inputs, Config.outputs, <Scenario>.inputs, <Scenario>.outputs, <Sequence>.inputs, and <Sequence>.outputs, I think.

For the scenario_1.d2.is_input() == True, it is correct right? Since we are looking at the data node at scenario context. But I don't see how we can implement it, because it need to know which scenario is calling to it as well, so .is_input() is not possible and make no sense.

jrobinAV · 2024-06-21T11:58:05Z

Are you saying I should better read you description? 🤣
If so, I believe you are right...

I misunderstood your proposal. Sorry.

trgiangdo · 2024-06-24T03:44:28Z

So do we agree on the requirements now?

jrobinAV · 2024-06-24T08:12:26Z

After a better reading, I now understand the proposal. I am okay with the concepts exposed in the Taipy core package. But I believe it does not answer the issue, in particular on the sentence from the description that is, in the end, the root motivation of the issue:

"I think the inputs/outputs of interest for the Data Node Selector are relative to the whole Config."

I strongly believe, we don't want to expose the config inputs and outputs in the data node selector.
The config is a developer concept, not an end-user concept. The end-user will not easily understand the input and output data nodes.
What is needed in the data node selector is another concept that sometimes (mostly in demos) overlaps with the developer input-output data node concept. My understanding is that the end-user wants to access two kinds of data nodes quickly:

The ones to eventually edit so he/she can recompute the scenario, and propagate the changes to other data nodes. these data nodes don't match the config inputs, even if they have an overlap with the developer's inputs.
The ones to visualize and analyze to understand or validate a solution. These data nodes don't match the config outputs, even if they have an overlap with the developer's outputs.

FlorianJacta added Core Related to Taipy Core 🟧 Priority: High Must be addressed as soon ✨New feature 💬 Discussion Requires some discussion and decision labels Jun 5, 2024

jrobinAV added the 🔒 Staff only Can only be assigned to the Taipy R&D team label Jun 5, 2024

trgiangdo added this to the Community 3.2 milestone Jun 18, 2024

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Define input, ouput, intermediate data nodes #1363

Define input, ouput, intermediate data nodes #1363

FlorianJacta commented Jun 5, 2024 •

edited

Loading

trgiangdo commented Jun 21, 2024 •

edited

Loading

FlorianJacta commented Jun 21, 2024

jrobinAV commented Jun 21, 2024 •

edited

Loading

FlorianJacta commented Jun 21, 2024

jrobinAV commented Jun 21, 2024

trgiangdo commented Jun 21, 2024

jrobinAV commented Jun 21, 2024

FlorianJacta commented Jun 21, 2024 •

edited

Loading

trgiangdo commented Jun 21, 2024 •

edited

Loading

jrobinAV commented Jun 21, 2024

trgiangdo commented Jun 24, 2024

jrobinAV commented Jun 24, 2024

Define input, ouput, intermediate data nodes #1363

Define input, ouput, intermediate data nodes #1363

Comments

FlorianJacta commented Jun 5, 2024 • edited Loading

Description

Solution Proposed

Impact of Solution

Additional Context

Acceptance Criteria

Code of Conduct

trgiangdo commented Jun 21, 2024 • edited Loading

FlorianJacta commented Jun 21, 2024

jrobinAV commented Jun 21, 2024 • edited Loading

FlorianJacta commented Jun 21, 2024

jrobinAV commented Jun 21, 2024

trgiangdo commented Jun 21, 2024

jrobinAV commented Jun 21, 2024

FlorianJacta commented Jun 21, 2024 • edited Loading

trgiangdo commented Jun 21, 2024 • edited Loading

jrobinAV commented Jun 21, 2024

trgiangdo commented Jun 24, 2024

jrobinAV commented Jun 24, 2024

FlorianJacta commented Jun 5, 2024 •

edited

Loading

trgiangdo commented Jun 21, 2024 •

edited

Loading

jrobinAV commented Jun 21, 2024 •

edited

Loading

FlorianJacta commented Jun 21, 2024 •

edited

Loading

trgiangdo commented Jun 21, 2024 •

edited

Loading