Daily Log for #alfresco IRC Channel

Alfresco discussion and collaboration. Stick around a few hours after asking a question.

Official support for Enterprise subscribers: support.alfresco.com.

Joining the Channel:

Join in the conversation by getting an IRC client and connecting to #alfresco at Freenode. Our you can use the IRC web chat.

More information about the channel is in the wiki.

Getting Help

More help is available in this list of resources.

Daily Log for #alfresco

2019-11-07 10:31:57 GMT <alfresco-discord> <MorganP> @Loftux: CIFS has been removed in 6.1

2019-11-07 10:34:56 GMT <alfresco-discord> <Loftux> @MorganP Excellent! I thought it was, not just which version. The issue I faced yesterday is I think due to kerberos was set to authenticate FTP, then it goes into that code. But I created a cifs keytab anyway.

2019-11-07 13:57:16 GMT <AFaust> angelborroy: I need input from a search expert... I am currently wondering what influences the number of low-level filter queries (e.g. as seen in SOLR filterCache stats) for a specific Repo -> SOLR search request.

2019-11-07 13:59:02 GMT <AFaust> I have built a SOLR event listener which pre-warms the filterCache by executing some queries with filter queries (low-level / internally via QParser.getParser(...).getQuery() + SolrIndexSearcher), and have used the same filterQueries as input as Alfresco sends in the HTTP request to SOLR, but the number of filter queries in the cache does not match what gets generated for the actual request.

2019-11-07 14:00:35 GMT <AFaust> I have already switched my listener implementation to "fake" the JSON body, so that AlfrescoFTSQParser can use that as context. And it has already helped significantly, but the counts are still not close to equal

2019-11-07 14:01:04 GMT <angelborroy> I don’t know the answer

2019-11-07 14:01:14 GMT <angelborroy> But let me see if I can find something relevant

2019-11-07 14:01:21 GMT <AFaust> E.g. without the fake JSON body, I would pre-warm 48 filter queries, and then the request would result in an additional ~67. With the fake body, the request only adds ~19 filter queries, which I don't know where they originate from.

2019-11-07 14:02:26 GMT <AFaust> Subtracting the authority + tenant filter queries, that is 17 filter queries for which I don't know the origin. And when the query runs for the first time, those queries cause the query to take ~50 seconds instead of ~2 seconds (my goal)

2019-11-07 14:02:57 GMT <AFaust> Luckily 50 seconds is already better then the previous time of 300 seconds...

2019-11-07 14:03:05 GMT <angelborroy> I guess this cache is handled by Repository, right?

2019-11-07 14:03:44 GMT <angelborroy> Do you have solr suggerter enabled?

2019-11-07 14:03:46 GMT <AFaust> No - talking solely about SOLR caches in this case

2019-11-07 14:03:56 GMT <angelborroy> 50 seconds is a really high value

2019-11-07 14:04:04 GMT <AFaust> Suggester is disabled because that would kill disk space

2019-11-07 14:04:34 GMT <AFaust> ^^ this is the big test system I have mentioned in the last weeks

2019-11-07 14:05:11 GMT <angelborroy> How many documents have that repo?

2019-11-07 14:07:19 GMT <angelborroy> Are you sure that suggester is completely disabled

2019-11-07 14:07:46 GMT <angelborroy> I’ve seen that before and I remember that I had to do something in the code to remove completely solr suggester

2019-11-07 14:08:01 GMT <angelborroy> As it was taking lot of time in the search request

2019-11-07 14:08:55 GMT <angelborroy> You are using “solr.suggester.enabled=false”, right?

2019-11-07 14:10:37 GMT <AFaust> ~100 million documents

2019-11-07 14:10:38 GMT <alfbot> AFaust: Error: "100" is not a valid command.

2019-11-07 14:11:02 GMT <angelborroy> Indexing also content or only metadata?

2019-11-07 14:11:46 GMT <AFaust> Yes, solr.suggester.enabled is false. The search request does not contain anything with regards to a request for suggestions (must be actively specified, by default only used from Share faceted search)

2019-11-07 14:12:48 GMT <AFaust> Only metadata, no content (and search / filter queries only go against targeted fields, and mostly with = expressions, so not even fuzzy stuff)

2019-11-07 14:13:50 GMT <angelborroy> I’m reading a report having 330M nodes

2019-11-07 14:13:56 GMT <AFaust> HTTP request as logged by SolrQueryHTTPClient on Repository side only specifies 50 filter queries, but after query is complete (completely fresh SOLR start), the cache contains 67 filter queries...

2019-11-07 14:14:01 GMT <angelborroy> And times are around 5 seconds

2019-11-07 14:14:27 GMT <AFaust> Once the filter queries are all cached, my request is only ~2 seconds.

2019-11-07 14:15:00 GMT <angelborroy> So your problem is with the filter caching, right?

2019-11-07 14:15:02 GMT <AFaust> Even other requests with identical filter queries are only around ~2 seconds + a couple of seconds for first time execution overhead (generally < 5 seconds)

2019-11-07 14:15:42 GMT <angelborroy> (as you said before :))

2019-11-07 14:15:50 GMT <AFaust> My problem is with finding out what those 17 extra filter queries are and where they come from, so I can include them in my cache pre-warming.

2019-11-07 14:16:11 GMT <AFaust> ... and maybe other queries as well, that might also get automagically added...

2019-11-07 14:17:57 GMT <angelborroy> I guess is caching every search having an “fq! parameter

2019-11-07 14:18:12 GMT <angelborroy> I mean “fq”

2019-11-07 14:19:14 GMT <AFaust> This system is isolated, there are no other queries apart from mine (I am logging SolrQueryHTTPClient on DEBUG and would see if there were)

2019-11-07 14:19:22 GMT <AFaust> And my query only has 50 fq, not 67...

2019-11-07 14:19:57 GMT <AFaust> So that is why I am confused about the 17 difference...

2019-11-07 14:20:47 GMT <AFaust> Hmm... wait a second. Could it be that the facet component is internally using filter queries to count matching elements?

2019-11-07 14:21:04 GMT <AFaust> Let me check - that could explain (some of) the difference, since there are some facet queries contained in the query

2019-11-07 14:21:12 GMT <angelborroy> right

2019-11-07 14:21:20 GMT <angelborroy> fc are also added to the cache

2019-11-07 14:22:11 GMT <AFaust> It's just not logically obvious, because they are "not" filter queries. It's just that they are used as if they were in SOLR internally...

2019-11-07 14:25:56 GMT <AFaust> ok - so facet queries look like they might account for 15 of the 17 filter queries. Adding to my pre-warming and doing another test cycle to see if that really is the case.

2019-11-07 14:33:56 GMT <AFaust> angelborroy: I just realised (even before having the results) that those facet queries can not be relevant for the filter cache count.

2019-11-07 14:34:17 GMT <angelborroy> why?

2019-11-07 14:34:22 GMT <AFaust> a) some facet queries are just not cacheable, e.g. the cm:modified range queries using the relative NOW term

2019-11-07 14:34:57 GMT <AFaust> b) the remaining facet queries are for cm:content.size, which just does not exist in our case

2019-11-07 14:35:49 GMT <AFaust> And after the restart, I again have only 48 entries in my filter cache - the added facet queries in my pre-warming code have been run, but not been added to the cache by SOLR

2019-11-07 14:41:59 GMT <AFaust> Too bad SOLR does not have any meaningful logging that could be disabled to track what queries are actually added to the cache

2019-11-07 14:42:09 GMT <AFaust> disabled => enabeld

2019-11-07 16:19:17 GMT <AFaust> angelborroy: Ah, it seems I forgot one step in decoding the log output from SolrQueryHTTPClient and my prewarming actually failed for the added facet queries

2019-11-07 16:20:16 GMT <angelborroy> Are you trying again?

2019-11-07 17:02:42 GMT <AFaust> Yes - and the filter cache actually now shows 64 cache queries.

2019-11-07 17:03:33 GMT <AFaust> Not counting the tenant + authorities queries (which I did not prewarm), this only makes for a difference of one filter query now...

2019-11-07 17:04:41 GMT <AFaust> Checking if those added cached filter queries have an impact on performance and if the "real" query generates additional queries in the cache...

2019-11-07 17:06:36 GMT <AFaust> Ok - only 3 more queries added - 2 of those are for tenants + authorities, as expect.

2019-11-07 17:07:15 GMT <AFaust> First query time is still only 50s, which means those facet query queries did not have any impact and were not where the cost originally came from.

2019-11-07 17:07:43 GMT <AFaust> I bet, the cost now is mostly associated with sorting by cm:name, which has to perform an initial load of the field into the field value cache.

2019-11-07 20:40:33 GMT <mmccarthy> Any idea why OOTB Alfresco is saying Share Services is not installed when it is listed as installed by alfresco-mmt?

2019-11-07 21:06:26 GMT <AFaust> mmccarthy: Check you share-config-custom.xml for a correct configuration of the alfresco-api Remote endpoint - it is used to check what modules are installed, e.g. validate Share Services is available.

2019-11-07 21:07:33 GMT <mmccarthy> thanks AFaust I will give that a try

End of Daily Log

The other logs are at http://esplins.org/hash_alfresco