Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Integrated Vectorization - adding OCR skill #1021

Merged
merged 4 commits into from
Jun 20, 2024

Conversation

komalg1
Copy link
Collaborator

@komalg1 komalg1 commented Jun 4, 2024

Purpose

Does this introduce a breaking change?

  • Yes
  • No

How to Test

  • Get the code
git clone [repo-address]
cd [repo-name]
git checkout [branch-name]
npm install

What to Check

Verify that the following are valid

  • When Integrated Vectorization is enabled, user should be able to upload images with text and search those images

Copy link

github-actions bot commented Jun 4, 2024

Coverage

Coverage Report
FileStmtsMissCoverMissing
code
   app.py14140%5–8, 10–11, 14, 17–19, 22, 24, 26–27
   create_app.py137199%334
code/backend
   Admin.py23230%5–9, 11, 13–14, 17, 20–21, 23–24, 27, 35–37, 41, 44–46, 48, 50
code/backend/batch
   add_url_embeddings.py470100% 
   batch_push_results.py360100% 
   batch_start_processing.py260100% 
   function_app.py17170%1–8, 10, 12–14, 16, 19–22
   get_conversation_response.py320100% 
code/backend/batch/utilities/common
   answer.py28292%23, 51
   source_document.py60591%44, 47, 51, 55, 128
code/backend/batch/utilities/document_chunking
   __init__.py70100% 
   chunking_strategy.py15193%25
   document_chunking_base.py10280%10, 16
   fixed_size_overlap.py190100% 
   layout.py190100% 
   page.py170100% 
   paragraph.py9277%9, 15
   strategies.py15380%15–16, 18
code/backend/batch/utilities/document_loading
   __init__.py15193%16
   document_loading_base.py9188%13
   layout.py12558%9, 12–13, 16, 25
   read.py12558%9, 12–13, 16, 25
   strategies.py20575%17, 19, 22–23, 25
   web.py18194%22
   word_document.py251348%11–12, 22–24, 27, 30, 33–37, 45
code/backend/batch/utilities/helpers
   azure_blob_storage_client.py853163%19, 23–25, 33, 57–58, 61, 78–79, 81, 85, 172–175, 179, 182, 184, 192–196, 219, 223–227, 229
   azure_computer_vision_client.py530100% 
   azure_form_recognizer_helper.py817013%11, 13, 16–17, 25, 27, 44–45, 52–55, 60–68, 73–75, 77–78, 81, 84–86, 88–90, 93, 97–98, 105–109, 111–114, 117–131, 133, 135–137, 139–140, 143, 145–147
   azure_search_helper.py570100% 
   document_chunking_helper.py13192%19
   document_loading_helper.py12191%15
   env_helper.py143695%246–248, 267–269
   llm_helper.py48785%42–43, 52, 63–64, 75, 96
   orchestrator_helper.py13192%24
code/backend/batch/utilities/helpers/config
   config_helper.py1380100% 
   conversation_flow.py40100% 
   embedding_config.py12191%27
code/backend/batch/utilities/helpers/embedders
   embedder_base.py5180%7
   embedder_factory.py100100% 
   integrated_vectorization_embedder.py36391%39–41
   push_embedder.py730100% 
code/backend/batch/utilities/integrated_vectorization
   azure_search_datasource.py190100% 
   azure_search_index.py350100% 
   azure_search_indexer.py230100% 
   azure_search_skillset.py240100% 
code/backend/batch/utilities/loggers
   conversation_logger.py36294%33–34
code/backend/batch/utilities/orchestrator
   __init__.py110100% 
   lang_chain_agent.py601968%21–24, 26, 61–62, 82–85, 102–103, 106–109, 116–117
   open_ai_functions.py541179%110–112, 115, 118–119, 122, 127–129, 134
   orchestration_strategy.py60100% 
   orchestrator_base.py50198%33
   prompt_flow.py520100% 
   semantic_kernel.py540100% 
   strategies.py15473%12, 15–16, 18
code/backend/batch/utilities/parser
   __init__.py7271%7, 11
   output_parser_tool.py390100% 
   parser_base.py9277%9, 19
code/backend/batch/utilities/plugins
   chat_plugin.py150100% 
   post_answering_plugin.py80100% 
code/backend/batch/utilities/search
   azure_search_handler.py64296%25, 31
   integrated_vectorization_search_handler.py72198%35
   search.py140100% 
   search_handler_base.py481568%16–18, 21–23, 28, 34, 38, 42, 46, 50, 54, 58, 62
code/backend/batch/utilities/tools
   answer_processing_base.py8275%8, 12
   answering_tool_base.py9277%9, 15
   content_safety_checker.py41978%19, 52–54, 57–59, 66–67
   post_prompt_tool.py170100% 
   question_answer_tool.py700100% 
   text_processing_tool.py160100% 
code/backend/pages
   01_Ingest_Data.py72720%1–10, 12–14, 16, 24–26, 30, 33–34, 37–40, 42–45, 49–51, 54–56, 59–66, 69–71, 73, 76–79, 82, 87–89, 91–92, 94–95, 98–99, 103–105, 110–113, 120–121, 126, 132–133
   02_Explore_Data.py28280%1–7, 9–10, 12, 20, 28, 31–33, 37, 40–41, 43–46, 48–51, 54–55
   03_Delete_Data.py41410%1–8, 10–12, 14, 22–24, 28, 32, 40, 42–43, 45–48, 50, 52–53, 57, 61–65, 67, 70–71, 74–75, 77–79
   04_Configuration.py1371370%1–9, 11–12, 14, 22–24, 28, 30, 35–44, 47–48, 51–62, 64–65, 67–69, 73–74, 86–90, 93–94, 98–100, 103–104, 107–108, 111–112, 135, 137–138, 140–144, 146–149, 152–156, 163–164, 174–176, 178, 198–199, 201, 203, 209, 217, 225, 232–233, 240, 242–243, 247, 255, 261, 268, 286–290, 296–297, 316–317, 321, 323–324, 347, 381–382, 386–387, 390–391, 394–397, 399–400, 402–404, 406–409, 411–412
TOTAL257957377% 

Tests Skipped Failures Errors Time
288 0 💤 0 ❌ 0 🔥 39.180s ⏱️

@komalg1 komalg1 marked this pull request as ready for review June 4, 2024 15:30
@adamdougal
Copy link
Collaborator

I assume this still works fine for non image based data?

Do any functional tests need adding/updating for this?

Copy link
Collaborator

@adamdougal adamdougal left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great work!

@komalg1 komalg1 added this pull request to the merge queue Jun 20, 2024
Merged via the queue into Azure-Samples:main with commit 30440a8 Jun 20, 2024
6 checks passed
@komalg1 komalg1 deleted the komal/iv-ocr branch June 20, 2024 13:27
Copy link

🎉 This PR is included in version 1.7.0 🎉

The release is available on GitHub release

Your semantic-release bot 📦🚀

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

3 participants