본문 바로가기
1인 프로젝트/나만의 도서관

드디어... elasticsearch랑 tesseract가 잘 작동한다!

by kirope 2024. 8. 1.
반응형

tesseract(PDF OCR) + elasticsearch(고급 검색 기능 지원)

 

## 결과물(일부)

{

  • took: 15,
  • timed_out: false,
  • _shards: {
    • total: 1,
    • successful: 1,
    • skipped: 0,
    • failed: 0
    },
  • hits: {
    • total: {
      • value: 2,
      • relation: "eq"
      },
    • max_score: 1,
    • hits: [
      • {
        • _index: "library",
        • _id: "UhSbCJEB_1TVpDRMDzOb",
        • _score: 1,
        • _source: {
          • title: "pdf24_images_merged.pdf",
          • content: ""
          }
        },
      • {
        • _index: "library",
        • _id: "UxSOCZEB_1TVpDRM6zM7",
        • _score: 1,
        • _ignored: [
          • "content.keyword"
          ],
        • _source: {
          • title: "pdf24_images_merged.pdf",
          • content: "LETTING GO THE PATHWAY OF SURRENDER David R. Hawkins, M.D., Ph.D. ALSO BY DAVID R. HAWKINS, M.D., PH.D. Dissolving the Ego, Realizing the Self Along the Path to Enlightenment Healing and Recovery Reality, Spirituality, and Modern Man Discovery of the Presence of God: Devotional Nonduality Transcending the Levels of Consciousness: The Stairway to Enlightenment Truth vs. Falsehood: How to Tell the Difference I: Reality and Subjectivity The Eye of the I: From Which Nothing Is Hidden Power vs. Force: The Hidden Determinants of Human Behavior Dialogues on Consciousness and Spirituality Qualitative and Quantitative Analysis and Calibration of the Levels of Human Consciousness Orthomolecular Psychiatry (with Linus Pauling) Please visit: Hay House USA: www.hayhouse.com® Hay House Australia: www.hayhouse.com.au Hay House UK: www-hayhouse.co.uk Hay House India: www.hayhouse.co.in

## 콘솔 로그

#elasticsearch

[2024-08-01T01:23:00,010][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] triggering scheduled [ML] maintenance tasks
[2024-08-01T01:23:00,074][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Deleting expired data
[2024-08-01T01:23:00,175][INFO ][o.e.x.m.j.r.UnusedStatsRemover] [node-1] Successfully deleted [0] unused stats documents
[2024-08-01T01:23:00,178][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Completed deletion of expired ML data
[2024-08-01T01:23:00,181][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask

 

# tesseract(FLASK)

INFO:werkzeug:127.0.0.1 - - [01/Aug/2024 01:09:51] "GET / HTTP/1.1" 200 -
DEBUG:root:Upload request received
INFO:root:Saving file to uploads\pdf24_images_merged.pdf
INFO:root:Extracting text from PDF
INFO:root:Indexing to Elasticsearch
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:9200
DEBUG:urllib3.connectionpool:http://localhost:9200 "POST /library/_doc HTTP/11" 201 0
INFO:elastic_transport.transport:POST http://localhost:9200/library/_doc [status:201 duration:2.079s]
INFO:root:File uploaded and indexed successfully
INFO:werkzeug:127.0.0.1 - - [01/Aug/2024 01:10:47] "POST /upload HTTP/1.1" 302 -
INFO:werkzeug:127.0.0.1 - - [01/Aug/2024 01:10:47] "GET / HTTP/1.1" 200 -

 

하지만 이제 겨우 입력과 저장 단계만 구축한 거라 더 많은 기능들이 추가되어야 한다...

728x90
반응형