Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Proposal: Allow custom metrics to be computed based on tasks length #177

Open
horseunnamed opened this issue May 14, 2020 · 6 comments
Open

Comments

@horseunnamed
Copy link

Hello. I want to create custom build metric that computes total duration for some subset of executed tasks.

In current master branch I found abstract class ExecutedTasksMetric and it looks like appropriate choice for such use cases. But in provider for ExecutedTasksMetric we can only access ExecutedTasksInfo that doesn't have info about tasks duration.

It would be nice to also have access to list of TaskLength in provider for this metric. For example by adding this information to ExecutedTasksInfo or ExecutedGradleTaskInfo.

@cdsap
Copy link
Owner

cdsap commented May 15, 2020

Hi @horseunnamed , thank you very much for the proposal.
It's very interesting the request, I'm thinking how to implement it without breaking the recent modularity we are applying to the Metrics components.

In the case ExecutedTasksInfo, it's special case and comes from the implementation of the new measuring cache hits, developed by @MyDogTom. In the PR(#169) it includes the description:

BuildCacheOperationListener listens for two events: ExecuteTaskBuildOperationType (task was executed) and BuildCacheRemoteLoadBuildOperationType (attempt to download remote cache). Summarised information is exposed as ExecutedTasksInfo entity. This information is used by TalaiotPublisherImpl in order to attach cache usage summary to ExecutionReport.Environment and to every task.

That's why we are doing the aggregation of two entities.

So I'm not sure if we want to include data from TaskLength there. But at the same time, you're bringing a new concept to be considered in the metrics for the builds.

@MyDogTom @mokkun any thoughts?

Meanwhile, you can use a custom publisher with filtering options and there manage the grouped tasks to do the aggregation.

@horseunnamed
Copy link
Author

Thanks for your reply!

I've thought about implementing my aggregation with custom publisher, but problem is that I want to also send this information to InfluxDB and I'm currently using InfluxPublisher.
The only option I came up with is to register my custom publisher before InfluxPublisher and mutate instance of ExectuionReport before it comes to InfluxPublisher. But I haven't tried it yet and it looks like hacky solution :)

@MyDogTom
Copy link
Collaborator

@cdsap I think it would be beneficial to allow people perform some extra actions before publishing.
Actually, while changing TalaiotPublisherImpl I was thinking that TalaiotPublisherImpl mixes two responsibilities: prepare/modify data and publishing data. Separate metric could be responsible for the tasks data preparation.

Coming back to @horseunnamed request.
I'm hesitating if metrics is a right mechanism for that. I think responsibility of a Metric should be produce raw data and shouldn't be mixed with transforming existing data. I propose to have a separate "transfromation" mechanism which is executed when all data is measured, but before publishing the report.

@horseunnamed There is another alternative approach which shifts computation from plugin level to InfluxDB level. Talaiot allows you to generate buildId for every build. Like that you can know which task belongs to which execution. Publish initial report to a retention policy with a very short duration, since buildId has a verify high cardinality. Then use continuous query to compute required data and drop unnecessary fields. See Downsample and retain data.

@horseunnamed
Copy link
Author

horseunnamed commented May 19, 2020

@MyDogTom, thanks for your suggestion! It's an interesting idea. And it seems like I can accomplish this by using HybridPublisher to configure publishing of tasks and build metrics with different retention policies for InfluxDB.

But actually it looks to me like too complicated way, because I don't really want to transmit and store (even very temporarily) detail information about all tasks on large multi-module project. Especially giving that all information about tasks duration is already available to aggregation before publishing.

It would be really nice to have some separate "transfromation" mechanism as you proposed!
So we can avoid overhead of publishing very large set for tasks data in such cases.

More on practical aspect of such aggregation: I want to compute duration and ratio (to overall build time) of tasks related to tests and kapt. This kind of aggregations seems helpful for build time analysis.

@cdsap
Copy link
Owner

cdsap commented May 20, 2020

Actually, while changing TalaiotPublisherImpl I was thinking that TalaiotPublisherImpl mixes two responsibilities: prepare/modify data and publishing data. Separate metric could be responsible for the tasks data preparation.

@MyDogTom Totally agree with these statements. We need to detach the responsibility of the preparation of the data.

I really like the concept of "Transformation", new entity that cover uses cases like the one explained by @horseunnamed, given an input(default List of TaskLenght) we apply the transformation defined. And the output will be notified to the publishers.

We can notice that currently we are doing some kind of transformations, for example using TaskFilterProcessor, but again happening in the TalaiotPublisherImpl

My next question, it's about the Publisher, the main contract with the publishers is coupled with ExecutionReport, maybe it's time to compose with the metadata defined by the Transformer?

@MyDogTom
Copy link
Collaborator

MyDogTom commented May 20, 2020

I propose to approach this task in two steps.

Step 1. Implement Transformation mechanism where transformer is limited to existing ExecutionReport data structure. Basically it can only modify existing fields like cacheRatio, tasks or customProperties. It means that transformer cannot produce data with several rows but has to flatten data into customProperties.

Step 2. Add possibility to define custom data structures that will be recognized by publishers. For example, instead of publishing information about each tasks, one could just publish information about every module. In this case information about each module could be a separate row.

Step 1 seems to be pretty straightforward.
Step 2 seems to be a bit trickier so we could discuss and collect ideas while working on step 1.

@cdsap If you agree, then I would love to prepare a draft for the step 1 (approximately during this week).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants