Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

API ability to get "leading" events for query period #53

Open
jantman opened this issue Jan 17, 2019 · 14 comments
Open

API ability to get "leading" events for query period #53

jantman opened this issue Jan 17, 2019 · 14 comments

Comments

@jantman
Copy link

jantman commented Jan 17, 2019

First I'd like to express again my thanks for such a wonderful and helpful project!

I've been using ActivityWatch for about two weeks now, and have also implemented a few very custom watchers for my own use cases. I'm currently running off of the latest tag, v0.8.0b7.

One thing I've noticed though that is a bit annoying to me is the Events and Query API behavior (and corresponding UI behavior) for long events that begin prior to the query period being cut off. For example, if I was AFK for a long amount of time until 30 minutes ago, and query for events in the last hour (or pull up the Timeline view in the UI for the last hour), I only see the "not-afk" event that started 30 minutes ago... and blank space leading up to it.

It seems to me like it would be really helpful if the events and query API endpoints had an option to return a "leading" event, if there's an event that started before the start of the query period but ended during the query period. I could also see this being represented in the timeline view with some sort of special icon or left border, indicating that the event began prior to the currently-viewed period.

@jantman
Copy link
Author

jantman commented Jan 18, 2019

I looked around in the aw-core code a bit, and it seems like this is a bigger ask than I thought because (1) the EventModel stores a datetime start and decimal duration, and (2) SQLite doesn't have any native DateTime type, so it's not possible (or at least very complicated) to query events based on "end time" (a derived/hybrid value) in SQL.

It also seems like, from what I can tell based on some other issues and the ability to POST an event via the API with a specified ID, we can't count on event IDs being in chronological order...

@ErikBjare
Copy link
Member

We're aware of this unfortunate behavior (it's been discussed elsewhere, can't remember where though) and there's actually a fix in the works with the new sqlite datastore @johan-bjareholt has been working on in aw-core: ActivityWatch/aw-core#57

That PR has been in the works quite a while though, and been kinda on ice as it needs a database migration that is critical we don't fuck up.

The easiest way to currently get leading events is to simply over-query and then cut off the excess on the client side.

@jantman
Copy link
Author

jantman commented Jan 18, 2019

Hmm, ok. I tried looking for other discussions of this, but I'll admit there's a LOT of repos/issues/PRs/forum posts/etc. and I didn't do an exhaustive search.

Hmm... ok. That certainly does look interesting... when I first saw in the peewee docs (I've used SQLAlchemy before but not peewee) that it stores the times as formatted strings instead of ints (which seem much more logical to me) I wondered about that... because if peewee had just used timestamps, it would enable date math like this.

On the other hand - and I know I'm a real edge case user - I did really like the idea of using a database abstraction layer, and had planned on giving MySQL a try at some point in the future.

My concern with over-querying is, by how much? If I go away on vacation for a week, I end up with a week-long AFK event...

I'd actually been thinking about a tentative plan where the AbstractStorage implementations would add a get_leading_events boolean parameter to get_events()... and if it's True, after building the list of Event instances for the current query, we'd find the minimum ID in the returned events, query for the N events before that by ID, and then examine them on the Python side to see if they should be returned and if so return them. I know this would just be best effort and rely on events being committed in somewhat chronological order, but I also don't think the performance of it should be too bad.

I know I've seen mention in other places of queueing. I'm not sure how bad it is for other people, but I currently have 49,234 events across 8 buckets, and only 149 of those events have IDs that aren't ordered identically to the start timestamps (i.e. if I build lists of all events in each bucket and sort them by both ID and timestamp, only 149 list items are in different positions between the ID- and timestamp-sorted lists). More importantly, 7 of those were from my own development mistakes sending bad timestamps while working on a custom watcher, and the rest (142) were all for the Chrome watcher which was responsible for about 25% of all events. Of those 142 out-of-order events, 102 of them were zero-duration events. Of the remaining 40, 26 were less than 10 seconds in duration.

Given that the case we are - or at least I am - concerned about is long events, I think the ordering should be less of an issue... i.e. my own concern is with events that last somewhere from multiple minutes to multiple hours, and those should be much less likely to be out of order. So if my current data is any indication, I think that examining the 4 or 5 events with IDs less than the first result from the current query should probably yield the desired result in most cases.

Though I suppose this could also be achieved by doing all of it client-side with a separate query...

@johan-bjareholt
Copy link
Member

johan-bjareholt commented Jan 18, 2019

@jantman You are overcomplicating things. The solution we used in the future sqlite storage is instead of saving the duration we save the endtime, so if we have both starttime and endtime we don't need to over-querying as we now have the data to get the exact right events. With starttime and endtime we also no longer need the duration since endtime-starttime=duration so the database didn't even grow in size. We also no longer save datetimes as strings since it doesn't really make any sense as long as you don't care about timezones.

I have used multiple ORMs before and almost every time I try to do something slightly complicated I start looking at how the underlying database works rather than how the ORM works which defeats the purpose of an ORM. SQLite is not that complicated either, so just using SQLite directly makes more sense to us.

@jantman
Copy link
Author

jantman commented Jan 18, 2019

Ok, that all makes sense. I'll be looking forward to a release with the sqlite storage and migrations.

@johan-bjareholt
Copy link
Member

If you have the time, it would be appreciated if you could test it out as things are rather stable on that branch. We need more testing of if the migration works correctly.
You should just need to clone aw-server, checkout the development branch, change your datastore to "sqlite" in the aw-server.ini config and it should be good to go.
The aw-server-rust also has this fix, if you are interested in testing something even more bleeding-edge :)

@jantman
Copy link
Author

jantman commented Jan 18, 2019

Ok, yeah, I'd be happy to test it! I have a 3-day weekend so I should definitely have some time to.

Right now I'm installing from git using a clone of the main activitywatch repo, installed per the source install docs. Do you know if I should be OK to just update aw-server to development or whether other submodules need to be updated as well?

I'll stick with the python ;) it's by far my strongest language, and a big reason of why I first checked out ActivityWatch.

@johan-bjareholt
Copy link
Member

aw-server heavily depends on aw-core so you need that as well with the newest version most likely.

@jantman
Copy link
Author

jantman commented Jan 21, 2019

I was just about to give this a test and ran into a bit of confusion for me... you said I should just be able to check out aw-server to the development branch... but aw-server currently has six dev/ branches, none of which share the same name as the aw-core dev/sqlite branch that matches up with ActivityWatch/aw-core#57

I tried checking out aw-core on the dev/sqlite branch and aw-server on the master branch, but that renders aw-server completely unusable with SQLite threading errors (i.e. API calls result in a 500 error logging sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140560724657664 and this is thread id 140560642361088.).

I'm unable to check the migrations because I can't get a single GET request to succeed.

My process for the upgrade was as follows:

  • stop aw-server, watchers, qt
  • switch my clone to https://github.com/jantman/activitywatch/tree/jantman-current
  • make build (note this threw some npm errors for "vis")
  • back up ~/.local/share/activitywatch/aw-server/peewee-sqlite.v2.db
  • sed -i 's/peewee/sqlite/' ~/.config/activitywatch/aw-server/aw-server.ini
  • run aw-server --verbose to ensure it looks right:
$ aw-server --verbose
2019-01-21 18:09:15 [INFO ]: Using storage method: sqlite  (aw_server.main:26)
2019-01-21 18:09:15 [INFO ]: Starting up...  (aw_server.main:31)
2019-01-21 18:09:15 [INFO ]: Using database file: /home/jantman/.local/share/activitywatch/aw-server/sqlite.v1.db  (aw_datastore.storages.sqlite:69)
2019-01-21 18:09:15 [INFO ]: Created new SQlite db file  (aw_datastore.storages.sqlite:81)
2019-01-21 18:09:15 [INFO ]: Migrating database from peewee v2 to sqlite v1  (aw_datastore.migration:31)
2019-01-21 18:09:15 [INFO ]: Using database file: /home/jantman/.local/share/activitywatch/aw-server/peewee-sqlite.v2.db  (aw_datastore.storages.peewee:89)
2019-01-21 18:09:15 [INFO ]: Migrating bucket aw-watcher-window_phoenix.jasonantman.com  (aw_datastore.migration:38)
2019-01-21 18:09:19 [INFO ]: Migrating bucket aw-watcher-afk_phoenix.jasonantman.com  (aw_datastore.migration:38)
2019-01-21 18:09:19 [INFO ]: Migrating bucket aw-watcher-web-chrome  (aw_datastore.migration:38)
2019-01-21 18:09:21 [INFO ]: Migrating bucket aw-watcher-web-firefox  (aw_datastore.migration:38)
2019-01-21 18:09:22 [INFO ]: Migrating bucket aw-watcher-manual-flaskweb_phoenix.jasonantman.com  (aw_datastore.migration:38)
2019-01-21 18:09:22 [INFO ]: Migrating bucket aw-watcher-hass-device  (aw_datastore.migration:38)
2019-01-21 18:09:22 [INFO ]: Migrating bucket aw-watcher-working_phoenix.jasonantman.com  (aw_datastore.migration:38)
2019-01-21 18:09:22 [INFO ]: Migrating bucket aw-watcher-hass-desk-standing  (aw_datastore.migration:38)
2019-01-21 18:09:22 [INFO ]: Migration of peewee v2 to sqlite v1 finished  (aw_datastore.migration:50)
 * Serving Flask app "aw-server" (lazy loading)
 * Environment: production
   WARNING: Do not use the development server in a production environment.
   Use a production WSGI server instead.
 * Debug mode: off
2019-01-21 18:09:22 [INFO ]:  * Running on http://0.0.0.0:5600/ (Press CTRL+C to quit)  (werkzeug:88)
2019-01-21 18:09:30 [ERROR]: Exception on /api/0/buckets/aw-watcher-web-chrome/heartbeat [POST]  (flask.app:1761)
Traceback (most recent call last):
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper
    resp = resource(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/views.py", line 88, in view
    return self.dispatch_request(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_server/rest.py", line 233, in post
    event = app.api.heartbeat(bucket_id, heartbeat, pulsetime)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_server/api.py", line 26, in g
    if bucket_id not in self.db.buckets():
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_datastore/datastore.py", line 48, in buckets
    return self.storage_strategy.buckets()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_datastore/storages/sqlite.py", line 113, in buckets
    c = self.conn.cursor()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140560724657664 and this is thread id 140560642361088.
2019-01-21 18:09:30 [INFO ]: 500 (127.0.0.1): POST /api/0/buckets/aw-watcher-web-chrome/heartbeat?pulsetime=80 HTTP/1.1  (flask:25)
2019-01-21 18:09:50 [ERROR]: Exception on /api/0/buckets/aw-watcher-web-chrome/heartbeat [POST]  (flask.app:1761)
Traceback (most recent call last):
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper
    resp = resource(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/views.py", line 88, in view
    return self.dispatch_request(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_server/rest.py", line 233, in post
    event = app.api.heartbeat(bucket_id, heartbeat, pulsetime)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_server/api.py", line 26, in g
    if bucket_id not in self.db.buckets():
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_datastore/datastore.py", line 48, in buckets
    return self.storage_strategy.buckets()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_datastore/storages/sqlite.py", line 113, in buckets
    c = self.conn.cursor()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 140560724657664 and this is thread id 140560642361088.
2019-01-21 18:09:50 [INFO ]: 500 (127.0.0.1): POST /api/0/buckets/aw-watcher-web-chrome/heartbeat?pulsetime=80 HTTP/1.1  (flask:25)
2019-01-21 18:12:23 [ERROR]: Exception on /api/0/buckets/ [GET]  (flask.app:1761)
Traceback (most recent call last):
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/app.py", line 1813, in full_dispatch_request
    rv = self.dispatch_request()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/app.py", line 1799, in dispatch_request
    return self.view_functions[rule.endpoint](**req.view_args)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask_restplus/api.py", line 325, in wrapper
    resp = resource(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask/views.py", line 88, in view
    return self.dispatch_request(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/flask_restplus/resource.py", line 44, in dispatch_request
    resp = meth(*args, **kwargs)
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_server/rest.py", line 115, in get
    return app.api.get_buckets()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_server/api.py", line 50, in get_buckets
    buckets = self.db.buckets()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_datastore/datastore.py", line 48, in buckets
    return self.storage_strategy.buckets()
  File "/opt/activitywatch/venv/lib/python3.7/site-packages/aw_datastore/storages/sqlite.py", line 113, in buckets
    c = self.conn.cursor()
sqlite3.ProgrammingError: SQLite objects created in a thread can only be used in that same thread. The object was created in thread id 139670367213056 and this is thread id 139670322161408.
2019-01-21 18:12:23 [INFO ]: 500 (127.0.0.1): GET /api/0/buckets/ HTTP/1.1  (flask:25)

(note: yes, I'm running bound to 0.0.0.0; yes, I'm well aware of the implications of that)

@johan-bjareholt
Copy link
Member

johan-bjareholt commented Jan 22, 2019

@jantman Oh right forgot about that, I guess that branch is currently broken then sadly. It's the same issue as discussed in the PR here ActivityWatch/aw-core#57 (comment)

I am rather busy the coming week or two, but after that I might have some time to fix that issue.

@ErikBjare
Copy link
Member

It should be a really easy fix though, I think we just need to call set treaded=False somewhere: https://stackoverflow.com/questions/38876721/handle-flask-requests-concurrently-with-threaded-true

@jantman
Copy link
Author

jantman commented Jan 22, 2019

There's actually an example of this in the Flask patterns docs, which shows using the flask.g object proxy for sqlite connections: http://flask.pocoo.org/docs/0.12/patterns/sqlite3/

I've got a few things on my plate in the next few days, but I'll try to work up an aw-server branch to fix this.

@ErikBjare
Copy link
Member

ErikBjare commented Jan 23, 2019

@jantman I've seen it but that's likely a lot more work. Since we've so strongly decoupled our database logic from Flask there's not really a clean way to create a new database connection on each request.

The equivalent in our design would be to create a PeeweeStorage/SqliteStorage object in get_db(), but that's probably a bad idea since there's initialization logic that runs in object construction that should only run once (which could admittedly be fixed, but extra work).

@johan-bjareholt
Copy link
Member

johan-bjareholt commented Jan 23, 2019

#53 (comment) Good point, did a quick untested PR #54

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants