Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[backends] How to keep a track of APIs #642

Open
animeshk08 opened this issue Apr 6, 2020 · 9 comments
Open

[backends] How to keep a track of APIs #642

animeshk08 opened this issue Apr 6, 2020 · 9 comments

Comments

@animeshk08
Copy link
Contributor

animeshk08 commented Apr 6, 2020

This issue is opened as GrimoireLab does not have a way to keep track of the changes which happen to the APIs used to fetch the data from various data sources. Although this does not present itself as a big issue as most of the APIs are quite stable and changes do not take place much often. However, some APIs are still in there beta phase(e.g. Gitter) and may evolve in the future.

A bit of discussion about this was made here: #636

EDIT: Another reason for opening this issue is that a lot of the APIs may add additional fields which may help us in getting more data from there data sources. However, this would not come in light unless the documentation is checked.

@animeshk08
Copy link
Contributor Author

animeshk08 commented Apr 6, 2020

After giving it a bit a thought two possible ways can be there:

  • Create a file having links to the documentation of the APIs for a backend and the endpoint which is used.

  • Create a file having links to the documentation of the APIs for a backend and the endpoint which is used. Also, have a change log section in the file, which is updated, whenever a change in the API is noticed/whenever we include this change is included in Perceval.

NOTE: I am in support of the second version.

@animeshk08
Copy link
Contributor Author

animeshk08 commented Apr 6, 2020

I also explored a third alternative to provide a solution to this issue. It will help with automating a few tasks. However, a few associated problems make it not a good approach in my opinion.

I would still like to state the solution as some reviews may make it feasible.

  • Create a script having methods to make simple requests(without pagination and rate headers) for the required endpoints of each backend.
  • The response is compared to a sample response which we are storing.
  • The comparison can be made in two ways.
  1. We can create sample user accounts and can just store the fields in the response and compare if the same fields are present. This approach will not require us to store the responses in a file. However, only change in the fields(not the value/ header) will be recorded.

  2. We can create sample user accounts and store the response in a file. The response can then be compared to the actual response returned from the backend server(Python library difflib can be used for this purpose). The headers are not covered in this approach either.

  • if a difference is found in the response a prompt may be generated to create and an entry in the file having the changelog.

Problem: The above approaches involves the problem of maintaining these sample user accounts. If we keep the credentials open it may create obvious problems. If the credentials are not open it will be utilized only by certain project maintainer and thus will not have use of being open source. Additionally, there is a chance of the user tokens getting expired.

@animeshk08
Copy link
Contributor Author

Any comments from the community are much appreciated as all the ideas are still in an early stage :)

@imnitishng
Copy link
Contributor

This seems to be a great idea, API changes are not that frequent but having a changelog will be helpful in the long run.
Personally I feel that automating the process (step 3 that you mentioned) might not be implementable keeping the credentials problem in mind.
In the end I would also support the second solution to the problem. Contributors together can make these API docs file for every backend perceval has.

@animeshk08
Copy link
Contributor Author

Thank you for you comments @imnitishng. Lets wait for others to reply. Maybe we can collaborate together if this idea goes forwards :)

@valeriocos
Copy link
Member

Sorry @animeshk08 I'm late on this discussion. I'll try to answer tomorrow

@valeriocos
Copy link
Member

valeriocos commented Apr 7, 2020

Hi @animeshk08 ,

The first 2 ideas look good, I understand that they are based on a (semi)manual approach to update the file and changelog. I have some questions:

  • Change detection: how are changes detected? A manual inspection of the APIs every month or something (a crawler) that looks for specific words (e.g., deprecat*) and generates a report?
  • Other approaches: have you checked previous works on this problem? We could look for works about api change detection, api change deprecation or detect api changes from documentation in https://scholar.google.it and then use their references to discover more related works. WDYT?

The 3rd idea looks interesting, I understand that it's based on a (semi)automatic approach. I have some questions:

  • Endpoint deprecation: what does it happen when an endpoint get deprecated in favour of another one? The deprecated endpoint will keep working for some time, and we won't detect the change until the endpoint is removed.
  • Other deprecations: How we can track resources that aren't endpoints? For instance, the way of generating a token in Slack will change in 1 month (https://api.slack.com/legacy/custom-integrations/legacy-tokens), something similar happened also for Meetup. The endpoints will keep working, however the old tokens may not working anymore.

@animeshk08
Copy link
Contributor Author

Hi @valeriocos. Sorry for the delayed reply. Thank you for raising such good questions.

Change detection: how are changes detected? A manual inspection of the APIs every month or something (a crawler) that looks for specific words (e.g., deprecat*) and generates a report?

The current approach involves a manual way of visiting the API docs after a regular time period. The idea of having a crawler seems interesting. I think it may have some problems as many APIs do not have a good release doc. Also, the release doc may not have clear keywords that may be needed. It would have been effective if we were focusing on a small number of backends, however, the backends are continuously increasing in GrimoireLab. Let me look at the release doc of the API if a good number of them are consistent we can use this to complement the changelog file.

Other approaches: have you checked previous works on this problem? We could look for works about API change detection, API change deprecation or detect API changes from documentation in https://scholar.google.it and then use their references to discover more related works. WDYT?

That's a great idea. I have not yet had a look at previous works. Let me do some digging and get back to you we will surely find some related work.

Endpoint deprecation: what does it happen when an endpoint gets deprecated in favor of another one? The deprecated endpoint will keep working for some time, and WE won't detect the change until the endpoint is removed.

The current approach does not solve this problem. I was unaware of such changes. Let me think if there is some workaround for this. Also in your experience how often does this happen?

Other deprecations: How we can track resources that aren't endpoints? For instance, the way of generating a token in Slack will change in 1 month (https://api.slack.com/legacy/custom-integrations/legacy-tokens), something similar happened also for Meetup. The endpoints will keep working, however, the old tokens may not working anymore.

As I have mentioned header related changes are not yet covered in this approach. In case there is a change in the auth format it will be not be accepted by the endpoint. This should raise an error(correct me if I am wrong does the endpoint continue the support for the token even after the change in the release). We can handle this error separately. The significance of the script is to help in catching these errors. Though I think the server logs are already doing that.

I would like to add that the script is meant to catch changes that are missed in the docs(either the doc is not maintained or the change has been overlooked). Hence looking at the API docs once in a while and subscribing to the releases mail/newsletter still remains an effective solution for this issue.

Thank you for all your questions. I understand there are a few issues with the approaches currently. I will look for related approaches and get back to you with a better/more solid proposal for this idea :)

@valeriocos
Copy link
Member

Thank you @animeshk08

The current approach does not solve this problem. I was unaware of such changes. Let me think if there is some workaround for this. Also in your experience how often does this happen?

It doesn't often happen. I remember only 2 cases in the last year:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants