sycamore_importer: manage deleted and updated documents #156

eric-anderson · 2023-11-09T00:56:35Z

Right now the importer has no way to notice that a crawled file has been deleted and remove it from the index. Similarly, if a file is updated, the crawlers will detect that, re-download the file and the importer will then import it again without removing the old file resulting in duplicate entries in the index.

It's unclear what the right way to handle this is, but a likely choice is that the files stored in the crawl_data compose volume should match the ones stored in opensearch and there should be no duplicates.

eric-anderson mentioned this issue Nov 9, 2023

Sycamore Importer + Aryn UI: Make the viewPdf UI option work #157

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

sycamore_importer: manage deleted and updated documents #156

sycamore_importer: manage deleted and updated documents #156

eric-anderson commented Nov 9, 2023

sycamore_importer: manage deleted and updated documents #156

sycamore_importer: manage deleted and updated documents #156

Comments

eric-anderson commented Nov 9, 2023