You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The Solr schema has had site_last_modified for a while. The original idea had been to populate this based on the most recent page_last_modified value (based on the Last-Modified header) for a site. However, so few sites use this, or use it in a not so useful way (e.g. static site generators setting the same last modified date for every page on site regeneration even for posts which haven't been modified in years), that it wasn't useful setting site_last_modified based on page_last_modified, so it remained unused. However when #94 is implemented it should be possible to set the site_last_modified for the first time.
Once set, it could be useful for all sorts of things, e.g. adding to the sort options for Browse, decreasing the indexing frequency for sites that haven't been updated in a long time, potentially automatically detecting and delisting sites which look dead, and maybe even used in the scoring.
Note that if basing it solely on the content_last_modified value, it is possible it might miss some other updates, e.g. to the title, author, description, or tags. It may be worth naming content_last_change_detected, page_last_change_detected and site_last_change_detected to differentiate from the page_last_modified set from the site's headers.
The text was updated successfully, but these errors were encountered:
The Solr schema has had site_last_modified for a while. The original idea had been to populate this based on the most recent page_last_modified value (based on the Last-Modified header) for a site. However, so few sites use this, or use it in a not so useful way (e.g. static site generators setting the same last modified date for every page on site regeneration even for posts which haven't been modified in years), that it wasn't useful setting site_last_modified based on page_last_modified, so it remained unused. However when #94 is implemented it should be possible to set the site_last_modified for the first time.
Once set, it could be useful for all sorts of things, e.g. adding to the sort options for Browse, decreasing the indexing frequency for sites that haven't been updated in a long time, potentially automatically detecting and delisting sites which look dead, and maybe even used in the scoring.
Note that if basing it solely on the content_last_modified value, it is possible it might miss some other updates, e.g. to the title, author, description, or tags. It may be worth naming content_last_change_detected, page_last_change_detected and site_last_change_detected to differentiate from the page_last_modified set from the site's headers.
The text was updated successfully, but these errors were encountered: