2017-12-18 06:09:43 GMT <ankita> hi everyone

2017-12-18 06:10:15 GMT <ankita> I am following https://addons.alfresco.com/addons/capturesco-alfresco-scanning-tool

2017-12-18 06:10:17 GMT <alfbot> Title: Capturesco - Alfresco scanning tool | Alfresco Add-ons - Alfresco Customizations (at addons.alfresco.com)

2017-12-18 06:11:20 GMT <ankita> and tring to integrate the capturesco open source tool which captures the documents during scanning and upload them directly to alfresco

2017-12-18 06:12:46 GMT <ankita> so,what I have done is that I have run the Capturesco 2.0.exe file

2017-12-18 06:13:23 GMT <ankita> On running this file,it is asking for username and password

2017-12-18 06:14:47 GMT <ankita> so I have entered the same user name and password that I have applied when running the alfresco

2017-12-18 06:15:16 GMT <ankita> but still it is prompting that the login credentials are wrong

2017-12-18 06:15:31 GMT <ankita> I am not able figure out this

2017-12-18 06:15:48 GMT <ankita> any help would be greatly appreciated

2017-12-18 06:33:29 GMT <divya> I am following the link Capturesco - Alfresco scanning tool | Alfresco Add-ons - Alfresco Customizations to integrate the capturesco into alfresco that captures the documents during scanning and uploads them directly to alfresco. I have started the alfresco and when I run the file Capturesco 2.0.exe directly from the bin folder in capturesco2\TWAINCapture\bin\Release,it is asking for the user name and password.So,I have entered th

2017-12-18 06:33:40 GMT <divya> I have entered while installing the alfresco,but it is still saying that "Invalid Username/password"

2017-12-18 06:33:57 GMT <divya> I am not able to figure out why this is so.. any idea regarding this would be greatly appreciated

2017-12-18 14:19:12 GMT <Tichodroma> Do you know how to disable the Alfresco Solr Suggester?

2017-12-18 14:20:12 GMT <CptLuxx> ahh

2017-12-18 14:20:15 GMT <CptLuxx> easy i had this last week

2017-12-18 14:20:22 GMT <mrks_js1> i did it for solr1, there are some how tos (in the doc i believe)

2017-12-18 14:20:24 GMT <CptLuxx> olr4/workspace-SpacesStore/conf

2017-12-18 14:20:29 GMT <CptLuxx> solr4/workspace-SpacesStore/conf

2017-12-18 14:20:34 GMT <CptLuxx> and change the config

2017-12-18 14:20:42 GMT <CptLuxx> Changed from solr.suggester.enabled=false from true to false

2017-12-18 14:20:59 GMT <Tichodroma> let me try that ...

2017-12-18 14:28:07 GMT <yreg> Tichodroma, I think even if you do that, data would still be stored for suggestions, if I am not mistaken, we had to edit schema.xml to squeeze even more performance

2017-12-18 14:29:02 GMT <CptLuxx> wait what

2017-12-18 14:29:10 GMT <CptLuxx> i did that last week and it worked hmpf

2017-12-18 14:30:59 GMT <Tichodroma> actually I am not sure what the problem is that I am facing

2017-12-18 14:31:35 GMT <Tichodroma> so I am trying some settings

2017-12-18 14:32:15 GMT <Tichodroma> if Solr errors with: Namespace prefix cm is not mapped to a namespace URI

2017-12-18 14:32:56 GMT <yreg> that's probably a problem with the model

2017-12-18 14:33:10 GMT <Tichodroma> I guess the model XML is not available for Solr, alf_data/solr/model is empty

2017-12-18 14:33:24 GMT <Tichodroma> What could prevent Solr from fetching the model XMLs from Alfresco?

2017-12-18 14:33:50 GMT <angelborroy> No errors at SOLR log?

2017-12-18 14:34:26 GMT <Tichodroma> not while starting, the first error is the missing NS when I perform the search

2017-12-18 14:34:56 GMT <angelborroy> SOLR should send a request to gather the model

2017-12-18 14:35:04 GMT <Tichodroma> so I thought

2017-12-18 14:35:16 GMT <angelborroy> https://docs.alfresco.com/5.2/concepts/solr-overview.html

2017-12-18 14:35:18 GMT <alfbot> Title: Solr overview | Alfresco Documentation (at docs.alfresco.com)

2017-12-18 14:35:23 GMT <angelborroy> it includes “model” in the URL

2017-12-18 14:35:36 GMT <angelborroy> you can try to remove indexes and to start again

2017-12-18 14:36:03 GMT <angelborroy> and some request from SOLR to Alfresco should be registered both in SOLR log and Apache / Tomcat access log

2017-12-18 14:36:03 GMT <Tichodroma> bad idea, the index is HUUUUUGE and building is $$$ expensive (loading from The Cloud)

2017-12-18 14:36:11 GMT <angelborroy> wow

2017-12-18 14:36:15 GMT <angelborroy> so the problem is big

2017-12-18 14:36:31 GMT <angelborroy> you can try the URL by yourself

2017-12-18 14:36:38 GMT <angelborroy> to detect if it’s producing some problem

2017-12-18 14:40:00 GMT <angelborroy> btw I had also a weird problem last week

2017-12-18 14:40:05 GMT <angelborroy> I’m still studying it

2017-12-18 14:40:22 GMT <angelborroy> 100% CPU consumption by PDFBox

2017-12-18 14:40:35 GMT <angelborroy> When trying to index content from a PDF containing images

2017-12-18 14:41:03 GMT <angelborroy> Anyone heard about this problem before?

2017-12-18 14:48:40 GMT <yreg> Tichodroma, did you have a recent change in the model ?

2017-12-18 14:48:47 GMT <yreg> (alfresco side ?)

2017-12-18 14:54:10 GMT <Tichodroma> none

2017-12-18 14:54:52 GMT <Tichodroma> the system is very large and having low level hardware/storage problems

2017-12-18 15:01:52 GMT <fwu> hi all!

2017-12-18 15:13:14 GMT <fwu> ppl, In a FTL I would like to reference my javascript from an outside js. Is this possible? If, yes, how?

2017-12-18 15:15:03 GMT <angelborroy> fwu All functions and variables are global in JavaScript

2017-12-18 15:29:37 GMT <fwu> angelborroy, I know that, but I would like to make specific js files for each ftl. So I would like to make an include, import, or something like that. As I do with js files inside bpmn files.

2017-12-18 15:30:08 GMT <fwu> right now, I need to compile the ftl in a jar file everytime I make a change

2017-12-18 15:30:10 GMT <angelborroy> fwu probably I don’t understand your question

2017-12-18 15:30:23 GMT <angelborroy> you can make a JS import in FTL

2017-12-18 15:46:32 GMT <fwu> I tried but it seems it doesnt work

2017-12-18 15:47:00 GMT <fwu> I tried this: <@script src=

2017-12-18 15:47:05 GMT <fwu> maybe this is wrong

2017-12-18 15:47:56 GMT <angelborroy> it’s fine

2017-12-18 15:48:03 GMT <angelborroy> fwu what is the problem?

2017-12-18 15:48:56 GMT <fwu> I cant see the js get from fiddler, so it is not running

2017-12-18 15:49:25 GMT <angelborroy> ok, so your JS is extenal to Share, right?

2017-12-18 15:49:33 GMT <angelborroy> extenal > external

2017-12-18 15:51:17 GMT <fwu> ah! that may be the problem.

2017-12-18 15:51:22 GMT <angelborroy> yes, it is

2017-12-18 15:51:45 GMT <fwu> I want internal, but im trying external... I should try as I need

2017-12-18 15:52:04 GMT <fwu> I will try that, thank you!

2017-12-18 15:52:34 GMT <mbui> Just switched from eclipse to intellij for development. Anyone know if there's any plugins/other IDE's that supports/helps developing in javascript? In particular back-end javascript code.

2017-12-18 16:02:06 GMT <yreg> mbui, is there such a thing for eclipse ?

2017-12-18 16:02:55 GMT <angelborroy> some colleagues of me are using this https://www.jetbrains.com/webstorm/features/

2017-12-18 16:02:56 GMT <alfbot> Title: Features - WebStorm (at www.jetbrains.com)

2017-12-18 16:03:04 GMT <Tichodroma> mbui: what is special about Rhino/Nashorn code? What kind of support for JS do you need?

2017-12-18 16:03:09 GMT <angelborroy> but it’s not IDEA, it’s a different environment

2017-12-18 16:06:10 GMT <mbui> Well, basically i'm just wondering if there's some plugins/IDE's that supports the import statements or the "for each" statements.

2017-12-18 16:06:51 GMT <Tichodroma> the import is special to Alfresco

2017-12-18 18:42:00 GMT <angelborroy> AFaust sorry to bother you…

2017-12-18 18:42:17 GMT <angelborroy> … had you any CPU 100% consumption due to PDF content extraction?

2017-12-18 18:42:41 GMT <angelborroy> Is the first time I’ve seen that and the customer said that it should be something “usual”

2017-12-18 18:43:03 GMT <angelborroy> I’ve had to port part of this method https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java#L170

2017-12-18 18:43:04 GMT <alfbot> Title: tika/PDF2XHTML.java at master · apache/tika · GitHub (at github.com)

2017-12-18 18:43:07 GMT <angelborroy> to Tika 1.6

2017-12-18 18:43:14 GMT <angelborroy> In order to solve the issue

2017-12-18 18:47:44 GMT <AFaust> angelborroy: Well, you are surely not asking IF I ever had such an issue, because then the asnwer would be yes.

2017-12-18 18:48:05 GMT <AFaust> More likely you are asking if I had the problem on a specific version of Alfresco with a specific constellation of PDF document, right?

2017-12-18 18:48:39 GMT <angelborroy> for me it’s with PDFs produced by CamScanner

2017-12-18 18:48:47 GMT <angelborroy> in both CE 5.1 & 5.2

2017-12-18 18:49:13 GMT <AFaust> I have had such situations at customers when the PDF was a bit too large / complex and the system to tight on memory, so I would get 100% due to GC overhead

2017-12-18 18:49:49 GMT <angelborroy> thanks

2017-12-18 18:49:54 GMT <angelborroy> this is not my scenario

2017-12-18 18:50:20 GMT <angelborroy> It’s a semi-infinite loop when Tika is extracting images

2017-12-18 18:50:58 GMT <angelborroy> In fact, the loop is infinite but it’s not detected by the JVM and in the end gets 100% of CPU

2017-12-18 18:51:20 GMT <angelborroy> it’s weird, it’s supposed that this should happen before

2017-12-18 18:51:32 GMT <angelborroy> But I cannot find any reference

2017-12-18 18:51:39 GMT <angelborroy> Thanks again AFaust

2017-12-18 18:52:56 GMT <AFaust> So it occurs on TIKA 1.6 and you patched that. What was the original lines where it did occur?

2017-12-18 18:53:23 GMT <angelborroy> let me find it

2017-12-18 18:53:23 GMT <angelborroy> Well, you are surely not asking IF I ever had such an issue, because then the asnwer would be yes.

2017-12-18 18:53:26 GMT <angelborroy> sorry

2017-12-18 18:53:32 GMT <angelborroy> https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java#L170

2017-12-18 18:53:33 GMT <alfbot> Title: tika/PDF2XHTML.java at master · apache/tika · GitHub (at github.com)

2017-12-18 18:53:38 GMT <angelborroy> This is the new implementation of the method

2017-12-18 18:53:55 GMT <AFaust> I mean, you ported something back to TIKA 1.6, which you certainly did not do blindly...

2017-12-18 18:54:06 GMT <angelborroy> yep

2017-12-18 18:54:12 GMT <AFaust> So the entire method you say...

2017-12-18 18:54:25 GMT <angelborroy> nope

2017-12-18 18:54:28 GMT <angelborroy> This is the original

2017-12-18 18:54:29 GMT <angelborroy> https://github.com/apache/tika/blob/1.6/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java#L295

2017-12-18 18:54:30 GMT <alfbot> Title: tika/PDF2XHTML.java at 1.6 · apache/tika · GitHub (at github.com)

2017-12-18 18:54:51 GMT <angelborroy> So I only took the “COSBase” detection

2017-12-18 18:55:04 GMT <AFaust> Ah - I can see how the method was wayyy shorter in 1.6

2017-12-18 18:55:08 GMT <angelborroy> Just to stop the recursion at https://github.com/apache/tika/blob/master/tika-parsers/src/main/java/org/apache/tika/parser/pdf/PDF2XHTML.java#L199

2017-12-18 18:55:09 GMT <alfbot> Title: tika/PDF2XHTML.java at master · apache/tika · GitHub (at github.com)

2017-12-18 18:56:40 GMT <AFaust> Ah - I see.

2017-12-18 18:56:42 GMT <angelborroy> I need to produce a PDF without personal data to open a issue in Alfresco (as Tika has solved this)

2017-12-18 18:57:05 GMT <AFaust> Hmm - if this would cause an infinite recursion I would expect some sort of StackOverflow at some point

2017-12-18 18:57:18 GMT <angelborroy> Yep

2017-12-18 18:57:31 GMT <angelborroy> It’s infinite but it’s not exactly detected by JVM

2017-12-18 18:57:56 GMT <AFaust> Unless - due to the recursion and for-loop, I can see how you could run into a memory issue before you run into the StackOverflow one.

2017-12-18 18:58:26 GMT <angelborroy> let me reproduce the problem (one sec)

2017-12-18 18:59:13 GMT <AFaust> It should be quite easy to reproduce this issue if the customer / you are able to identify the PDF being processed while CPU spikes to 100%

2017-12-18 18:59:33 GMT <angelborroy> yep, the customer is having the issue all the time

2017-12-18 18:59:45 GMT <angelborroy> and I have a document to produce it at will

2017-12-18 19:00:21 GMT <AFaust> Is it a document you could share (i.e. non-sensitive)?

2017-12-18 19:00:30 GMT <angelborroy> nope, this is the problem

2017-12-18 19:01:10 GMT <AFaust> Ok, I see that CamScanner has a BASIC (free) account. Let's see if I can reproduce this by scanning something with that app

2017-12-18 19:01:20 GMT <angelborroy> it should be nice

2017-12-18 19:03:17 GMT <AFaust> Since I already have my local 5.2 up to do some more testing and feedback on the trivial ALF-21963 I can do a quick test of this as well

2017-12-18 19:04:41 GMT <angelborroy> Are you using Android?

2017-12-18 19:04:48 GMT <angelborroy> I’m doing the same operation with iOS

2017-12-18 19:06:14 GMT <AFaust> Yes, Android

2017-12-18 19:06:51 GMT <angelborroy> in my case it was “multiple"

2017-12-18 19:08:53 GMT <angelborroy> I cannot get the same effect with iOS

2017-12-18 19:08:57 GMT <angelborroy> It works as expected

2017-12-18 19:10:39 GMT <AFaust> Grml - I hate it when you can't easily download files from such apps via USB, but have to go via Google Drive... such BS

2017-12-18 19:10:54 GMT <angelborroy> In Apple we have AirDrop

2017-12-18 19:11:05 GMT <angelborroy> to send the file directly fron the phone to the computer

2017-12-18 19:11:59 GMT <angelborroy> I’m going to try using the OCR feature

2017-12-18 19:13:16 GMT <AFaust> Do you always get the bogus title + description value "þÿ"?

2017-12-18 19:14:04 GMT <AFaust> But no 100% CPU with a simple test scan here...

2017-12-18 19:14:16 GMT <angelborroy> The smae here

2017-12-18 19:14:25 GMT <angelborroy> title and description are the name of the file

2017-12-18 19:14:39 GMT <angelborroy> yep, it works

2017-12-18 19:20:13 GMT <angelborroy> AFaust I’ve annonymmised the file

2017-12-18 19:20:26 GMT <angelborroy> AFaust are you interested in testing the “virus”?

2017-12-18 19:22:49 GMT <AFaust> Can do - just got it

2017-12-18 19:24:14 GMT <AFaust> Ah yes, the CPU fan is picking up the pace...

2017-12-18 19:24:46 GMT <angelborroy> It’s an infinite loop

2017-12-18 19:24:54 GMT <angelborroy> It will not end

2017-12-18 19:25:14 GMT <angelborroy> But I don’t know what this document has

2017-12-18 19:25:21 GMT <angelborroy> Why is special?

2017-12-18 19:25:49 GMT <angelborroy> Applying the patch, it goes flui

2017-12-18 19:25:51 GMT <angelborroy> fluid

2017-12-18 19:25:52 GMT <AFaust> 11 levels of recursion currently...

2017-12-18 19:26:17 GMT <angelborroy> that is: an infinite recursion

2017-12-18 19:26:54 GMT <AFaust> Now 12 levels - so it will (at some point very long in the future) run into the StackOverflow

2017-12-18 19:27:06 GMT <angelborroy> probably

2017-12-18 19:27:20 GMT <AFaust> I'm getting a memory dump and will look at that

2017-12-18 19:27:22 GMT <angelborroy> but it waits a lot of time

2017-12-18 19:27:45 GMT <angelborroy> you’ll see that “extractImages” from PDF2XHTML method

2017-12-18 19:28:51 GMT <AFaust> Yes, that is what I meant with X-levels of recursion

2017-12-18 19:29:55 GMT <angelborroy> I’ll raise an issue at Alfresco (I have to expand that white boxes), but probably it will be discarded

2017-12-18 19:30:51 GMT <AFaust> "probably" => very likely

2017-12-18 19:30:57 GMT <angelborroy> yep

2017-12-18 19:31:04 GMT <angelborroy> I must use “likely”, I know

2017-12-18 19:32:33 GMT <angelborroy> Anyway you can confirm that you’ve never seen this behaviour, right?

2017-12-18 19:33:41 GMT <AFaust> So far, I had not seen this particular behaviour, correct.

2017-12-18 19:33:57 GMT <angelborroy> thanks, this is what I guessed

2017-12-18 19:34:53 GMT <AFaust> And the bug is really with the TIKA code - the PDXObjectForm.getResources() create a copy of the PDResources which contains the very same data for the next recursion step

2017-12-18 19:35:56 GMT <angelborroy> yep

2017-12-18 19:36:10 GMT <angelborroy> this is why I applied that “COS” patch

2017-12-18 19:36:26 GMT <angelborroy> but it looks like this is not a very common case

2017-12-18 19:36:27 GMT <AFaust> And the memory overhead of the recursion is so low that you will likely not get a memory error.

2017-12-18 19:36:38 GMT <AFaust> The only thing I don't understand is why it takes so long to recurse

2017-12-18 19:36:52 GMT <angelborroy> the same

2017-12-18 19:40:47 GMT <AFaust> One operation in PDResources appears to be extremely inefficient

2017-12-18 19:40:54 GMT <AFaust> PDResources.reverseMap

2017-12-18 19:42:38 GMT <AFaust> This is being called everytime the duplicated PDResources.getXObjects() is being called, so for every recursion step

2017-12-18 19:43:41 GMT <AFaust> This has nearly 98% of the CPU time

2017-12-18 19:43:58 GMT <angelborroy> my point is, why this only happens with that kind of document?

2017-12-18 19:44:16 GMT <AFaust> And most if it is due to a HashMap.put - that does not make much sense

2017-12-18 19:44:24 GMT <AFaust> Well, it depends on the PDF document structure

2017-12-18 19:44:51 GMT <AFaust> You need to have this PDXObjectForm in the first place

2017-12-18 19:44:56 GMT <angelborroy> Alfresco is patching Tika

2017-12-18 19:45:13 GMT <angelborroy> so probably my modification could be added to that patch

2017-12-18 19:45:16 GMT <AFaust> I've looked in the patched TIKA source the whole time...

2017-12-18 19:45:24 GMT <AFaust> Yes, it could / should

2017-12-18 19:45:54 GMT <AFaust> And they have also patched PDFBox which is the one where the inefficient reverseMap operation is in

2017-12-18 19:46:05 GMT <angelborroy> yep

2017-12-18 19:46:05 GMT <AFaust> Haven't found the Alfresco source for that patch though

2017-12-18 19:46:12 GMT <angelborroy> the same

2017-12-18 19:46:23 GMT <angelborroy> probably it’s in an inner SVN branch

2017-12-18 19:47:22 GMT <AFaust> Well, there is a sources ZIP in artifacts server - problem is that Maven only deals with a sources.jar, so again, boohoo Alfresco for bad source attachment management

2017-12-18 19:47:54 GMT <AFaust> Whoooot? And the sources.zip is 78 MiB in size? What the heck...?

2017-12-18 19:52:20 GMT <AFaust> Oh - if you don't stop it, it will escalate into multiple threads running the same file. Probably due to timeout on the SOLR side...

2017-12-18 19:53:27 GMT <angelborroy> Yep

2017-12-18 19:53:40 GMT <angelborroy> This is what it was happening in customer enviroment

2017-12-18 19:54:14 GMT <angelborroy> And in the end, the service was discontinued due to CPU consumption

2017-12-18 19:54:16 GMT <AFaust> DId you by any chance check any of the JIRA issues for 6.0 if TIKA is going to be updated?

2017-12-18 19:54:38 GMT <angelborroy> nope

2017-12-18 19:55:15 GMT * AFaust bashes his head against the table in frustration about Alfresco Nexus / artifact management

2017-12-18 19:55:26 GMT <AFaust> angelborroy: Guess what is contained in the sources.zip

2017-12-18 19:56:44 GMT <AFaust> No guess? Well, then I'm just going to tell....

2017-12-18 19:57:56 GMT <AFaust> Someone ZIPed the entire source code project on disk, and did so, after running "mvn package" (or something more elaborate), so all the (sub-)projects contain a "target" directory with all the Maven output and temporary data, including extracted files, WARs, JARs...

2017-12-18 19:59:14 GMT <angelborroy> heh

2017-12-18 19:59:33 GMT <angelborroy> 78 MiB

2017-12-18 19:59:48 GMT <AFaust> Unpacked it is actually ~190 MiB

2017-12-18 19:59:54 GMT <angelborroy> wow

2017-12-18 20:02:57 GMT <AFaust> Ok, the reverseMap() operation actually looks fine, so it must have something to do with some of the hashCode/equals implementations in the PDFBox value classes. But this is where I'll stop checking this.... Still have to write up an offer for some work that may keep me busy about a quarter of next year...

2017-12-18 20:03:55 GMT <angelborroy> thanks AFaust

2017-12-18 20:04:02 GMT <angelborroy> you’re my hero ;-)

2017-12-18 20:11:04 GMT <angelborroy> AFaust https://issues.alfresco.com/jira/browse/ALF-21970

2017-12-18 22:00:05 GMT <brian-int> hi, when configuring Alfresco v201711(6.0.0) on CentOS7, via the wizard, should one change the default (loopback) port to the public IP address of the machine?

2017-12-18 22:00:31 GMT <brian-int> s/wizard/text wizard/

2017-12-18 22:01:50 GMT <brian-int> text mode installer instructions on the alfresco site doesn't say much about this option but the GUI version says a bit but isn't clear: http://docs.alfresco.com/community/tasks/simpleinstall-community-lin.html

2017-12-18 22:01:52 GMT <alfbot> Title: Installing Alfresco Community Edition on Linux | Alfresco Documentation (at docs.alfresco.com)

2017-12-18 22:02:28 GMT <brian-int> also, does it take DNS names, ex: hostname.domain.com? or do I have to enter in an IP?

2017-12-18 23:43:48 GMT <AFaust> brian-int: I never use any of the installers, so I don't know how these inputs are mapped. You need to configure an externally addressable DNS name for that value or any emails generated by the server will include incorrect links / URLs. The same goes for ports, specifically the HTTP/SSL ports

