Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

ImageAnalyzer running VNRecognizeTextRequest in the background? #3

Open
spencer-hong opened this issue Dec 14, 2022 · 2 comments
Open

Comments

@spencer-hong
Copy link

spencer-hong commented Dec 14, 2022

Using VisionKit, there's two main ways to get text from images.

Based on some OCR tests, I'm seeing that the outputs from these two methods are different. Initially, I thought ImageAnalyzer was running VNRequestTextRecognitionLevel.fast because it's for Live Text, but the outputs from ImageAnalyzer are sometimes better than VNRequestTextRecognitionLevel.accurate.

VNRecognizeRequest does have more options, including language correction and custom words.

Do you know what ImageAnalyzer is calling in the background? Is it essentially running VNRecognizeRequest or is it a separate model/pipeline? And this naturally begs the question, which model would be better for OCR? My initial tests show a pretty similar performance in aggregation between ImageAnalyzer and VNRequestTextRecognitionLevel.accurate, but the results per test case can sometimes be highly variable between the two.

For documentation & in case this is outside the scope of your expertise, I've asked the same question on Apple Developers forum here.

@freedmand
Copy link
Owner

I'm also seeing a slight disparity between the two. The results are very close, so I do wonder if it's just particular settings of VNRecognizeRequest or a whole new pipeline. In any case, I anecdotally feel like ImageAnalyzer has more favorably picked up small bits of text that VNRecognizeRequest sometimes misses. To make matters more confusing, the live text interface allows selecting individual words, and these appear different than ImageAnalyzer's full text output and VNRecognizeRequest's.

I've started a thread of my own as well, to see if it's possible to get bounding boxes from the ImageAnalyzer. The additional options from VNRecognizeRequest are nice and would potentially be useful to have as options in textra in the future.

Currently I'm implementing a feature to get positional text using VNRecognizeRequest, with the caveat that the returned positional text may differ (which may change in the future if we hear back on the threads)

@aehlke
Copy link

aehlke commented Jan 31, 2024

@freedmand did you discover a way to get bounding boxes from ImageAnalyzer? the only possibility I can think of is to use VNRecognizeTextRequest to first get bounding boxes of text, and then extract images of that text to put into ImageAnalyzer to get enhanced results within a known box of text (including attributed strings) but I'm not sure that would really work.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants