Daily Log for #alfresco IRC Channel

Alfresco discussion and collaboration. Stick around a few hours after asking a question.

Official support for Enterprise subscribers: support.alfresco.com.

Joining the Channel:

Join in the conversation by getting an IRC client and connecting to #alfresco at Freenode. Our you can use the IRC web chat.

More information about the channel is in the wiki.

Getting Help

More help is available in this list of resources.

Daily Log for #alfresco

2019-11-14 09:03:43 GMT <hi-ko> good morning! Q for solr6 in prod: we see more OOM exceptions than on solr4 (Xmx16g). solr6 has its default oom_solr.sh. What is the best way to automatically (re)start solr in Alfresco CE once the oom_solr.sh script killed solr?

2019-11-14 09:04:08 GMT <angelborroy> What is “oom_solr.sh”?

2019-11-14 09:05:53 GMT <angelborroy> I mean, why aren’t using default “solr” program

2019-11-14 09:10:13 GMT <hi-ko> angelborroy: I had some network issues to freenode and didn't get your first answer. could you please resend?

2019-11-14 09:10:59 GMT <hi-ko> btw logging on http://chat.alfresco.com is not working anmore?

2019-11-14 09:11:00 GMT <alfbot> Title:Logs for #alfresco IRC channel (at chat.alfresco.com)

2019-11-14 09:23:26 GMT <angelborroy> Just thinking on why to use oom_solr.sh instead of solr program

2019-11-14 09:26:11 GMT <hi-ko> what do you mean by solr program? oom_solr.sh is registered by default.

2019-11-14 09:34:24 GMT <angelborroy> https://github.com/Alfresco/SearchServices/blob/master/search-services/packaging/src/docker/Dockerfile#L50

2019-11-14 09:34:25 GMT <alfbot> Title:SearchServices/Dockerfile at master · Alfresco/SearchServices · GitHub (at github.com)

2019-11-14 09:38:24 GMT <hi-ko> sure but /bin/solr is just a script which starts a jvm with -XX:OnOutOfMemoryError=bin/oom_solr.sh

2019-11-14 09:40:43 GMT <angelborroy> Right

2019-11-14 09:40:53 GMT <angelborroy> Again, what do you mean by re-starting?

2019-11-14 09:41:06 GMT <angelborroy> You can use watchdog or similar, right?

2019-11-14 09:42:20 GMT <hi-ko> there is no default / best practice as far I can see. One has to write it's own watchdog script, right?

2019-11-14 09:44:21 GMT <hi-ko> Since we don't run in docker we are lucky and can use systemd watchdog. I'll try this first.

2019-11-14 09:50:42 GMT <angelborroy> Ok, let me know how it goes

2019-11-14 10:34:51 GMT <hi-ko> angelborroy: I've a working systemd setup and will write that down in a gist but I try to avoid the root cause: GC running in openjdk11 causes the OOM. Are there any recommendations for -XX settings for openjdk11?

2019-11-14 10:36:03 GMT <hi-ko> AFaust: do you have experience with openjdk11 and solr6? what is your best practice GC setup?

2019-11-14 10:37:08 GMT <angelborroy> -XX:+UseConcMarkSweepGC -XX:+UseParNewGC

2019-11-14 10:37:12 GMT <AFaust> Honestly, with SOLR6, I have so far kept GC settings (apart from Xmx/Xms) on the SOLR default. Though I am usually quite fond of exclusively using G1GC instead of whatever legacy default is configured.

2019-11-14 10:38:29 GMT <hi-ko> that was also my intention but now I have to fight with GC causing OOM (XMx with 16g)

2019-11-14 10:38:30 GMT <AFaust> I don't yet have any relevant number of systems on OpenJDK 11 in production, so experience with that particular version is not extensive enough to have any "new" insights.

2019-11-14 10:40:13 GMT <AFaust> I assume you have already analysed the memory usage patterns in your specific SOLR instance, e.g. understand why certain amounts of memory are being used / exhausted?

2019-11-14 10:41:05 GMT <AFaust> Fighting with SOLR GC for me is typically only the last step, and I rarely had to do it so far... at least for SOLR 6 on JDK 8...

2019-11-14 10:42:53 GMT <AFaust> Most of my SOLR (regardless of version) OOMs typically were the result of either issues with data modelling or extremely excessive / aggressive use cases in queries

2019-11-14 10:44:34 GMT <hi-ko_> AFaust: I didn't analyse the memeory consumption cause so far. it's just latest search service install with jdk11 same memory as running solr4. I will check/compare the cache config and will test jdk8

2019-11-14 10:45:30 GMT <AFaust> How many (indexable) nodes in workspace://SpacesStore?

2019-11-14 10:45:42 GMT <hi-ko_> ~50 mio

2019-11-14 10:45:42 GMT <alfbot> hi-ko_: Error: "50" is not a valid command.

2019-11-14 10:45:50 GMT <hi-ko_> 50 mio

2019-11-14 10:46:04 GMT <AFaust> That is one of the key factors to keep in mind when evaluating cache config.

2019-11-14 10:47:41 GMT <hi-ko_> ass sorcore config looks quite ugly since generated in a strange order without comments

2019-11-14 10:48:23 GMT <AFaust> ok - so for each query / filter cache entry, you have to calculate the memory cost of an int[] array that has slightly more than ~50 million entries...

2019-11-14 10:49:46 GMT <AFaust> if I am not mistaken, this should be slightly less than half a gig per cache entry

2019-11-14 10:49:59 GMT <AFaust> BTW - this has not changed between SOLR 4 and 6 at all

2019-11-14 10:50:50 GMT <hi-ko_> that's a good hint. this means if we had already a good working cache config in sol4 we can test this as a starting point

2019-11-14 10:50:58 GMT <AFaust> You can find the (commented) solrcore.properties in the rerank template

2019-11-14 10:52:47 GMT <hi-ko_> why does the CE use the rerank template mechanism at all? looks somehow overcompliacted

2019-11-14 10:52:59 GMT <AFaust> Default filter cache allows for up to 256 entries, but we also have 128 allowed entries for authority, readers, owner, denied caches, and 256 allowed entries for path cache

2019-11-14 10:54:21 GMT <AFaust> I guess because even CE supports sharding - and as long as you do not use sharding, most if not all of rerank template config behaves the same as the previous non-rerank templates.

2019-11-14 10:55:50 GMT <hi-ko_> solr6 supports sharding. so if someone know how it works it should work also with CE I guess ...

2019-11-14 10:58:07 GMT <AFaust> Alfresco officially supports + documents sharding with SOLR 4 and 6 since ~5.1, for both CE and EE - no special knowledge necessary

2019-11-14 11:03:51 GMT <hi-ko_> sharding makes only sense if you exeed hardware limits but this is not our issue in virtual hardwar for now

2019-11-14 11:17:58 GMT <angelborroy> Sharding is not supported in CE

2019-11-14 11:20:42 GMT <AFaust> angelborroy: Since when? Manual sharding was always documented with no mention of "hey, this does not work on CE" - I have followed that documentation and successfully set up sharding in a couple of test systems, without any issues

2019-11-14 11:21:00 GMT <AFaust> "Dynamic" shard registration is not supported in CE - that is true.

2019-11-14 11:21:42 GMT <angelborroy> Some of the code is living in alfresco-enterprise-repository

2019-11-14 11:21:50 GMT <AFaust> Yes - the dynamic sharding one...

2019-11-14 11:22:56 GMT <AFaust> Checking the documentation right now - looks like between 6.0 and 6.1, most/all sharding documentation has been removed from ACS sections (including EE), and moved somewhere else

2019-11-14 11:23:18 GMT <AFaust> It now lives in ASS sections

2019-11-14 11:23:27 GMT <angelborroy> Right

2019-11-14 11:25:31 GMT <AFaust> Interestingly, the ASS config section have the "enterprise" fragment in their URL, even though there is no Enterprise-specific ASS release - apart from the Insight Engine distribution, which has its own documentation sections...

2019-11-14 11:26:21 GMT <angelborroy> I was reviewing that

2019-11-14 11:26:27 GMT <AFaust> But there is no language in the documentation which makes it clear that it is supposed to only apply to Enterprise, and if it did, the question would be "where is the documentation for community ASS?"

2019-11-14 11:26:46 GMT <angelborroy> Differences are living in alfresco-enterprise-repository and alfresco-enterprise-remote-api

2019-11-14 11:27:15 GMT <angelborroy> https://docs.alfresco.com/search-community/concepts/search-home.html

2019-11-14 11:27:17 GMT <alfbot> Title:Alfresco Search Services Community Edition 1.4 | Alfresco Documentation (at docs.alfresco.com)

2019-11-14 11:27:27 GMT <hi-ko_> don't argue for CE and sharding: I don't see the benefit yet and we need to setup solr in a way to fit in 16 gb memory

2019-11-14 11:27:47 GMT <AFaust> Ah - but via Google Search I found the CE equivalent, and it documents sharding as supported via the "manual sharding" method: https://docs.alfresco.com/search-community/concepts/solr-shard-config.html

2019-11-14 11:27:48 GMT <alfbot> Title:Setting up Solr sharding | Alfresco Documentation (at docs.alfresco.com)

2019-11-14 11:28:04 GMT <hi-ko_> I found the same ;-)

2019-11-14 11:28:48 GMT <angelborroy> Sharding is better in terms of performance but worse in terms of storage

2019-11-14 11:29:19 GMT <angelborroy> “Enterprise” is adding a Web Console based on the ShardRegistry

2019-11-14 11:29:22 GMT <AFaust> With storage being about the cheapest component in infrastructure, that would be a trivial trade-off

2019-11-14 11:29:54 GMT <hi-ko_> anyway: it's like repo cluster: If you don't want to waste resources it's always the smarter way to avoid

2019-11-14 11:29:57 GMT <AFaust> Yeah, but honestly, that web console + single Java class registry implementation is not really a complex piece of engineering.

2019-11-14 11:31:08 GMT <AFaust> It should be about the same amount of effort to re-implement that in CE as it was for the persistent runtime subsystem configuration done in OOTBee Support Tools (JMX-like)

2019-11-14 11:31:21 GMT <angelborroy> You’re right

2019-11-14 11:31:26 GMT <angelborroy> DocRouter is on the CE part

2019-11-14 11:31:27 GMT <angelborroy> https://github.com/Alfresco/SearchServices/blob/1.4.0/search-services/alfresco-search/src/main/java/org/alfresco/solr/tracker/DocRouter.java

2019-11-14 11:31:27 GMT <hi-ko_> why is sharding better in terms of performance given you're running on the same SAN controler?

2019-11-14 11:31:28 GMT <alfbot> Title:SearchServices/DocRouter.java at 1.4.0 · Alfresco/SearchServices · GitHub (at github.com)

2019-11-14 11:31:54 GMT <angelborroy> Searching time is shorter

2019-11-14 11:32:00 GMT <angelborroy> And you’re searching in parallel

2019-11-14 11:32:02 GMT <AFaust> hi-ko_: it's basically a matter of increased concurrency

2019-11-14 11:32:48 GMT <hi-ko_> hm - so the bottlenec is cpu not io since even in shared mode you need to read all

2019-11-14 11:33:29 GMT <AFaust> Most of query cost (especially in high volume systems) is IO cost of reading index fields from disk, if they don't happed to be cached yet. And given how index fields can be quite huge, most of not-always-used ones are likely not cached

2019-11-14 11:34:29 GMT <hi-ko_> so why should sharding speed up then given that the index is always bigger then available ram?

2019-11-14 11:34:39 GMT <AFaust> sharding -> smaller index sizes (shorter sequential IO load) + concurrency -> noticeable improvement

2019-11-14 11:35:46 GMT <AFaust> also - sharding + separate hosts / machines with same memory -> more of index effectively cached, while keeping JVM GC overhead low

2019-11-14 11:36:01 GMT <hi-ko_> I will test this but since common san environments run on the same controler it's a 0 sum game

2019-11-14 11:38:06 GMT <AFaust> Also sharding: index updates don't have to invalidate all of your index caches - only those of affected shards

2019-11-14 11:38:13 GMT <hi-ko_> all numbers I've seen so far made benefit only visibale if you have separate hardware for shard OS _and_ sorage

2019-11-14 11:38:44 GMT <hi-ko_> this is indeed a valid point!

2019-11-14 11:39:06 GMT <hi-ko_> I remember your redindex nightmare ...

2019-11-14 11:39:46 GMT <AFaust> For that 100m + hundreds of potential. reused filter queries test system, I am currently looking into DB_ID_RANGE sharding to deal with this

2019-11-14 11:40:23 GMT <AFaust> angelborroy: It would be cool if you could look into making sharded index tracking more efficient as well with your new approaches for indexing

2019-11-14 11:40:53 GMT <angelborroy> We are working on that

2019-11-14 11:40:54 GMT <hi-ko_> DB_ID_RANGE sharding doesn't help in terms of search performance, right?

2019-11-14 11:41:08 GMT <angelborroy> Currently, DB_ID_RANGE doesn’t help with performance

2019-11-14 11:41:36 GMT <AFaust> I don't understand why DB_ID_RANGE takes so long to even get to the relevant nodes, e.g. I have a bit over 100m nodes, and some shards have been set up to track ID 100m-110m, 110m-120m, and so on - but they are still cycling through the <100m node transactions after more than a day of tracking

2019-11-14 11:41:36 GMT <hi-ko_> same like db partitioning not matching the search terms

2019-11-14 11:41:57 GMT <angelborroy> We are working on starting the indexation of each shard from the first transaction containing the first DB_ID of a shard

2019-11-14 11:42:51 GMT <AFaust> I also do not understand yet why SOLR SUMMARY reports that there are ~100k nodes indexed when none of them should have matched the range yet, and index size is literally still under a meg

2019-11-14 11:42:56 GMT <angelborroy> Currently every Shard is indexing ACL and so on from the beginning

2019-11-14 11:43:04 GMT <alfresco-discord> <dgradecak> angelborroy: is there nay changes happening on the alfresco side for the solr rest api ?

2019-11-14 11:43:22 GMT <angelborroy> No

2019-11-14 11:43:43 GMT <hi-ko_> dgradecak: good news for you ;-)

2019-11-14 11:44:10 GMT <alfresco-discord> <dgradecak> finally something this week 😄

2019-11-14 11:45:39 GMT <AFaust> hi-ko_: As long as you don't count search performance improvements due to concurrency, better cacheability, less impact from GC (stop-the-world) and potential IO parallelisation (assuming separate physical disks for indexes), then no, DB_ID_RANGE does not improve search performance by itself

2019-11-14 11:46:45 GMT <angelborroy> Right

2019-11-14 11:47:00 GMT <hi-ko_> I don't see concurrency, better cacheability if running on the same hardware pool

2019-11-14 11:48:15 GMT <hi-ko_> not same but in sum similar may apply for GC given that we never will spend 500 gb in RAM

2019-11-14 11:50:17 GMT <hi-ko_> so it may be easy to test. I will compare sharding with no sharding for reindexing ...

2019-11-14 11:58:32 GMT *** jelly-home is now known as jelly

2019-11-14 12:34:03 GMT <hi-ko_> AFaust: is your G1 GC experience from openjdk? solr guys state: "Do not, under any circumstances, run Lucene with the G1 garbage collector."

2019-11-14 12:35:03 GMT <hi-ko_> for now I'll run with CMS as suggested by angelborroy

2019-11-14 13:40:34 GMT <AFaust> hi-ko_: Never encountered any issues with G1GC (apart from dumb misconfiguration mistakes)

2019-11-14 13:40:44 GMT <AFaust> ~later tell hi-ko_: Never encountered any issues with G1GC (apart from dumb misconfiguration mistakes)

2019-11-14 13:40:44 GMT <alfbot> AFaust: The operation succeeded.

2019-11-14 13:41:39 GMT <AFaust> ~later tell hi-ko_: I know that Lucene guys stated this with regards to G1 garbage collector - but in a recent check with SOLR documentation, I have not read this statement, which I believe was made years ago and may no longer be valid

2019-11-14 13:41:39 GMT <alfbot> AFaust: The operation succeeded.

2019-11-14 13:45:11 GMT <alfresco-discord> <yreg> I share that impression

2019-11-14 13:45:56 GMT <alfresco-discord> <yreg> had far better performance with G1GC, and had once someone from #solr channel on freenode telling me that recommendation was old and quite outdated

2019-11-14 16:18:04 GMT <alfresco-discord> <LuisColorado> Agreed, G1GC works quite well and avoid "stop the world" pauses. The trade off that I found is that uses more memory.

2019-11-14 16:19:01 GMT <alfresco-discord> <LuisColorado> A customer switched from CMS to G1GC and started to get OutOfMemory errors. I think they added about 20% more memory to solve the issue.

2019-11-14 16:21:37 GMT *** mmccarthy1 is now known as mmccarthy

2019-11-14 16:23:55 GMT <alfresco-discord> <LuisColorado> The main point of DB_ID_RANGE is just to be able to add extra servers in the future, without having to perform a full reindex, right? Otherwise, if using other sharding methods, you would have to setup all the servers upfront.

2019-11-14 16:29:35 GMT <AFaust> That's the main point as advertised, yes.

2019-11-14 16:31:34 GMT <AFaust> In my case it is also intended to limit the impact of cache invalidation and query re-execution after index updates. The use case is more of an archive system, where nodes with lower DB ID are far less likely to ever be updated. So the shards for older DB IDs can be larger (e.g. have 25m nodes) because for them, the cost of cache invalidation + query re-execution is not an issue since there won't be as many (or any) updates.

2019-11-14 16:32:14 GMT <AFaust> Then there are shards for the most recent DB IDs which are smaller, e.g. <= 10m nodes, so if any update / new node is indexed, the cost of cache invalidation + query re-execution is far lower

2019-11-14 16:32:36 GMT <AFaust> (smaller index => faster to re-calculate query results + re-load index fields into memory)

End of Daily Log

The other logs are at http://esplins.org/hash_alfresco