tesseract(PDF OCR) + elasticsearch(고급 검색 기능 지원)
## 결과물(일부)
{
-
took: 15,
-
timed_out: false,
-
_shards: {
-
total: 1,
-
successful: 1,
-
skipped: 0,
-
failed: 0
-
-
hits: {
-
total: {
-
value: 2,
-
relation: "eq"
-
-
max_score: 1,
-
hits: [
-
{
-
_index: "library",
-
_id: "UhSbCJEB_1TVpDRMDzOb",
-
_score: 1,
-
_source: {
-
title: "pdf24_images_merged.pdf",
-
content: ""
-
-
-
{
-
_index: "library",
-
_id: "UxSOCZEB_1TVpDRM6zM7",
-
_score: 1,
-
_ignored: [
-
"content.keyword"
-
-
_source: {
-
title: "pdf24_images_merged.pdf",
-
content: "LETTING GO THE PATHWAY OF SURRENDER David R. Hawkins, M.D., Ph.D. ALSO BY DAVID R. HAWKINS, M.D., PH.D. Dissolving the Ego, Realizing the Self Along the Path to Enlightenment Healing and Recovery Reality, Spirituality, and Modern Man Discovery of the Presence of God: Devotional Nonduality Transcending the Levels of Consciousness: The Stairway to Enlightenment Truth vs. Falsehood: How to Tell the Difference I: Reality and Subjectivity The Eye of the I: From Which Nothing Is Hidden Power vs. Force: The Hidden Determinants of Human Behavior Dialogues on Consciousness and Spirituality Qualitative and Quantitative Analysis and Calibration of the Levels of Human Consciousness Orthomolecular Psychiatry (with Linus Pauling) Please visit: Hay House USA: www.hayhouse.com® Hay House Australia: www.hayhouse.com.au Hay House UK: www-hayhouse.co.uk Hay House India: www.hayhouse.co.in
-
-
-
-
## 콘솔 로그
#elasticsearch
[2024-08-01T01:23:00,010][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] triggering scheduled [ML] maintenance tasks
[2024-08-01T01:23:00,074][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Deleting expired data
[2024-08-01T01:23:00,175][INFO ][o.e.x.m.j.r.UnusedStatsRemover] [node-1] Successfully deleted [0] unused stats documents
[2024-08-01T01:23:00,178][INFO ][o.e.x.m.a.TransportDeleteExpiredDataAction] [node-1] Completed deletion of expired ML data
[2024-08-01T01:23:00,181][INFO ][o.e.x.m.MlDailyMaintenanceService] [node-1] Successfully completed [ML] maintenance task: triggerDeleteExpiredDataTask
# tesseract(FLASK)
INFO:werkzeug:127.0.0.1 - - [01/Aug/2024 01:09:51] "GET / HTTP/1.1" 200 -
DEBUG:root:Upload request received
INFO:root:Saving file to uploads\pdf24_images_merged.pdf
INFO:root:Extracting text from PDF
INFO:root:Indexing to Elasticsearch
DEBUG:urllib3.connectionpool:Starting new HTTP connection (1): localhost:9200
DEBUG:urllib3.connectionpool:http://localhost:9200 "POST /library/_doc HTTP/11" 201 0
INFO:elastic_transport.transport:POST http://localhost:9200/library/_doc [status:201 duration:2.079s]
INFO:root:File uploaded and indexed successfully
INFO:werkzeug:127.0.0.1 - - [01/Aug/2024 01:10:47] "[32mPOST /upload HTTP/1.1[0m" 302 -
INFO:werkzeug:127.0.0.1 - - [01/Aug/2024 01:10:47] "GET / HTTP/1.1" 200 -
하지만 이제 겨우 입력과 저장 단계만 구축한 거라 더 많은 기능들이 추가되어야 한다...