Alfresco discussion and collaboration. Stick around a few hours after asking a question.
Official support for Enterprise subscribers: support.alfresco.com.
Join in the conversation by getting an IRC client and connecting to #alfresco at Freenode. Our you can use the IRC web chat.
More information about the channel is in the wiki.
More help is available in this list of resources.
2020-01-09 08:17:06 GMT <angelborroy> News for the Community
2020-01-09 08:17:35 GMT <angelborroy> Eddie May has joined Alfresco as Community Manager replacing Kristen
2020-01-09 10:14:20 GMT <alfresco-discord> <kumar> Hi All I had doubt , in share config we use the form id="" under the <config evaluator="aspect" condition="ms:metadata">
2020-01-09 10:14:58 GMT <alfresco-discord> <kumar> <forms> <form id="doclib-common-consumer-dashboard">
2020-01-09 10:18:07 GMT <alfresco-discord> <kumar> we are using same aspect under the different site preset so want to show up the order different for each site
2020-01-09 10:19:47 GMT <alfresco-discord> <kumar> when I tried with above one , the properties are not showing those sites so I came to know like this will not work but I want to conform is this correct way not?
2020-01-09 10:37:08 GMT <alfresco-discord> <monica> @kumar if you want to override this, then add replace="true" attribute to config like <config evaluator="aspect" condition="ms:metadata" replace="true">
2020-01-09 11:03:46 GMT <AFaust> Currently at a customer and preparing for a web session with Abbyy by trying out potential alternative (cloud-based) OCR services. Does anyone here have any good suggestions about OCR services (should include zonal OCR / configurable or trainable extraction + classification capabilities) that I/we should look at?
2020-01-09 11:03:59 GMT <AFaust> Currently playing around with a trial of docparser
2020-01-09 11:05:59 GMT <alfresco-discord> <yreg> There is tesseract, the obvious option
2020-01-09 11:06:06 GMT <alfresco-discord> <yreg> texract from aws
2020-01-09 11:06:32 GMT <alfresco-discord> <yreg> and in my experience, for Arabic locale, Abby wins by far both for recognition and for rendering
2020-01-09 11:07:21 GMT <alfresco-discord> <yreg> haven't tried other locales with it
2020-01-09 11:08:38 GMT <AFaust> tesseract is too low-level
2020-01-09 11:09:20 GMT <AFaust> ...that's why I mentioned zonal OCR + configurability / training, which you'd have to build custom with tesseract
2020-01-09 11:09:51 GMT <AFaust> Right - forgot about Textract, which would be the first time to try this...
2020-01-09 11:11:05 GMT <alfresco-discord> <yreg> I had to do some advanced tesseract manipulation during summer
2020-01-09 11:11:21 GMT <alfresco-discord> <yreg> and it wasn't that bad
2020-01-09 11:11:57 GMT <alfresco-discord> <yreg> check this project out, it's awesome, and has a quite extensive documentation : https://github.com/jbarlow83/OCRmyPDF/
2020-01-09 11:11:58 GMT <alfbot> Title:GitHub - jbarlow83/OCRmyPDF: OCRmyPDF adds an OCR text layer to scanned PDF files, allowing them to be searched (at github.com)
2020-01-09 11:25:46 GMT <alfresco-discord> <binduwavell> Axel, Abbyy does a great job. Depending on what your doing Nuance may out perform it.
2020-01-09 13:44:42 GMT <angelborroy> @AFaust did you try Ephesoft?
2020-01-09 13:44:50 GMT <angelborroy> I worked with it some years ago
2020-01-09 13:45:08 GMT <angelborroy> Not a bad product but oriented to huge amount of scanning documents
2020-01-09 14:09:48 GMT <AFaust> Based on my past experience with Ephesoft, this doesn't really fit what the customer wants to do. Ideally, Alfresco sends new documents to a service to automatically extract data without (much) user interaction and separate data storage (in a mailroom solution like Ephesoft)
2020-01-09 14:10:54 GMT <AFaust> That's why docparser (and textract) would match quite well. Abbyy I am not so sure about yet because of abysmal public information, but we have a web session tomorrow to address questions....
2020-01-09 14:11:38 GMT <AFaust> I / we also don't like the Windows-reliance of Abbyy products, and the legacy licensing model...
2020-01-09 14:12:33 GMT <alfresco-discord> <yreg> although I only used the desktop client of Abby, I think they listed a server component on their site as well
2020-01-09 14:14:03 GMT <alfresco-discord> <yreg> but indeed either good ol' tesseract (and believe me, it's not that hard to manage it, and continuously enhance models as well) or textract (if you are looking for a managed solution with support) are better suited for usecase indeed
2020-01-09 14:18:26 GMT <AFaust> Oh boy, of course AWS is extremely American-centric: "Amazon Textract can detect Latin-script characters from the standard English alphabet and ASCII symbols."
2020-01-09 14:20:08 GMT <AFaust> Already saw the kind of problems docparser has with German Umlauts, and am quite curious to see Abbyy, if we can get a no-up-front setup-hassle trial / test.
2020-01-09 14:21:13 GMT <AFaust> Which still amazes me, that OCR products have the same kind of problems like they had 10 years ago (when I last took a more intensive look at them), despite all the AI and ML stickers marketing puts on these products nowadays...
2020-01-09 14:35:24 GMT <alfresco-discord> <yreg> For textract, don't take anything for granted, try it out first
2020-01-09 14:35:56 GMT <alfresco-discord> <yreg> It could actually support those wierd characters upon the input of a valid hint for the language
2020-01-09 14:43:10 GMT <AFaust> Sure, trial will definitely occur before making a choice. It was just an input to manage the expectations of the customer...
2020-01-09 14:49:05 GMT <alfresco-discord> <dgradecak> AFaust: Abby works well with croatian characters, so I guess umlats should be fine too
2020-01-09 14:49:15 GMT <alfresco-discord> <dgradecak> ČĆŠ (if you can read that)
2020-01-09 14:59:00 GMT <alfresco-discord> <binduwavell> Axel, Nuance does have a Linux server version.
2020-01-09 15:03:32 GMT <AFaust> Can you give me a link to Nuance, because what I found via Google did not feel like an OCR focused product suite...
2020-01-09 15:04:18 GMT <AFaust> You mean OmniPage, right? The (cached) URLs I got in the search result always redirected to the more generic company web page...
2020-01-09 15:05:45 GMT <AFaust> Now I found a search result which redirected me to Kofax despite clearly showing a nuance.com URL in Google...
2020-01-09 15:09:43 GMT <AFaust> Ah, OmniPage was sold to Kofax in 2019, so that's part of the confusion and redirects, so looks like Nuance no longer offers the OCR product
2020-01-09 19:31:28 GMT <hi-ko> AFaust: My experience is: if money doensn't count kofax and abby have best results / in recognition quality, tooling and workflow. Also a very good (if not better) way is to split ocr and zonal recognition.
2020-01-09 19:33:40 GMT <hi-ko> we have good experience in chaining plain ocr (I prefer abbyy which also has a very cheap linux cli package), then pdf based extraction in pdfmdx which is like kofax but for pdf
2020-01-09 19:36:36 GMT <hi-ko> OmniPage is cheap but has very bad recognition.
The other logs are at http://esplins.org/hash_alfresco