Daily Log for #alfresco IRC Channel

Alfresco discussion and collaboration. Stick around a few hours after asking a question.

Official support for Enterprise subscribers: support.alfresco.com.

Joining the Channel:

Join in the conversation by getting an IRC client and connecting to #alfresco at Freenode. Our you can use the IRC web chat.

More information about the channel is in the wiki.

Getting Help

More help is available in this list of resources.

Daily Log for #alfresco

2019-11-18 11:35:49 GMT <hi-ko> angelborroy , Afaust: I solved the solr6 oom issues by switching back from openjdk11 to openjdk8 and using UseConcMarkSweepGC. Now reindexing runs without any error but ~ 10-20% slower than 5.2/solr4

2019-11-18 11:39:09 GMT <hi-ko> using UseConcMarkSweepGC on openjdk11 didn't have the same effect which means before switching to openjdk11 we need to figure out how to avoid OOM caused by GC (G1GC)

2019-11-18 12:46:33 GMT <AFaust> hi-ko: interesting - though I guess you might be close to / just below the OOM occurrence threshold / load factor with JDK 8 + CMS GC, and inherent differences in G1 could have just brought it over the edge

2019-11-18 12:50:27 GMT <AFaust> angelborroy: Do you see any chance for Alfresco to adopt multi stage Docker builds to keep their images leaner? I just realised that 1/4th of image size for ACS Repository (checked in current 6.2 states) is only due to chmod shenanigans and could easily be avoided by doing it in another stage and just copying over the finished webapps folder contents

2019-11-18 12:51:11 GMT <angelborroy> You can always try a PR

2019-11-18 12:51:43 GMT <angelborroy> I’ll try that it will be accepted by someone

2019-11-18 12:52:15 GMT <AFaust> Right... just like https://github.com/Alfresco/acs-community-packaging/pull/201...

2019-11-18 12:52:17 GMT <alfbot> Title:Sign in to GitHub · GitHub (at github.com)

2019-11-18 12:53:03 GMT <angelborroy> June 2019 :(

2019-11-18 12:53:22 GMT <hi-ko> AFaust: I think CMS on openjdk8 is quite faster then G1. G1 takes to long before my heap is filled up.

2019-11-18 12:53:37 GMT <hi-ko> s/to/too

2019-11-18 12:56:16 GMT <angelborroy> btw, I’ve created a draft for solr monitoring

2019-11-18 12:56:17 GMT <angelborroy> https://github.com/aborroy/alfresco-solr-monitoring

2019-11-18 12:56:18 GMT <alfbot> Title:GitHub - aborroy/alfresco-solr-monitoring: Monitoring Alfresco SOLR with solr-exporter, Prometheus and Grafana (at github.com)

2019-11-18 12:56:27 GMT <angelborroy> I’ll write a blog post later this week

2019-11-18 12:56:35 GMT <angelborroy> Any feedback will be appreciated

2019-11-18 13:38:11 GMT <AFaust> ~later tell angelborroy: For some reason, I expected DB_ID_RANGE to keep a stable / unaffected index when creating new nodes outside of the shards configured range, but somehow, I still get one or two dozens of index version increments when creating another ~100k documents

2019-11-18 13:38:11 GMT <alfbot> AFaust: The operation succeeded.

2019-11-18 14:11:04 GMT <alfbot> angelborroy: Sent 32 minutes ago: <AFaust> For some reason, I expected DB_ID_RANGE to keep a stable / unaffected index when creating new nodes outside of the shards configured range, but somehow, I still get one or two dozens of index version increments when creating another ~100k documents

2019-11-18 14:25:59 GMT <angelborroy> Yep, it’s because it’s upgrading transactiion, acl and other stuff

2019-11-18 14:26:10 GMT <angelborroy> We are working on that

2019-11-18 14:37:43 GMT <angelborroy> https://hub.alfresco.com/t5/alfresco-content-services-hub/deconstructing-solr-indexes/ba-p/293493

2019-11-18 14:37:44 GMT <alfbot> Title:Deconstructing SOLR Indexes - Alfresco Hub (at hub.alfresco.com)

2019-11-18 16:39:45 GMT <AFaust> angelborroy: out of curiosity, since you might be more familiar with ActiveMQ than I am, is Alfresco ever removing / deleting events/messages from the queue?

2019-11-18 16:40:00 GMT <AFaust> I mean, I expect that the Transform Service Router will do that, but what about the regular events?

2019-11-18 16:40:10 GMT <angelborroy> afaik messages are removed when consumed, right?

2019-11-18 16:40:37 GMT <AFaust> Yes, if they are actively "consumed" - but there is also something like queue browsing.

2019-11-18 16:40:56 GMT <angelborroy> I guess we are not using that “browsing"

2019-11-18 16:41:28 GMT <AFaust> Also, with regards to events for out-of-process behaviours, the event should probably not be (terminally) consumed by any of the out-of-process implementations, because who knows who else needs to process the same event later on?

2019-11-18 16:42:32 GMT <angelborroy> This is why I proposed an approach based in Kafka

2019-11-18 16:42:36 GMT <angelborroy> But it was discarded

2019-11-18 16:42:44 GMT <AFaust> Consider e.g. potential SOLR tracking based on events: An event is put on the queue, SOLR is currently not running, my out-of-process behaviour consumes the event, SOLR comes back online later and does not have an event anymore to track.

2019-11-18 16:42:55 GMT <angelborroy> After that, I lose interest in the topic

2019-11-18 16:43:07 GMT <AFaust> I can imagine...

2019-11-18 16:43:14 GMT <angelborroy> Yes, for that Kafka is the right approach

2019-11-18 16:43:25 GMT <angelborroy> But, again, I stepped back after my first attempt

2019-11-18 16:43:46 GMT <angelborroy> Anyway SOLR is not using queues by now

2019-11-18 16:43:59 GMT <angelborroy> We are planning that for next month / next year

2019-11-18 16:44:04 GMT <AFaust> I mean, I have never considered the JMS based approach to behaviours as realistic / usable myself, but the more I think of it, the more broken it appears from the ground up...

2019-11-18 16:44:59 GMT <angelborroy> Completely agree

2019-11-18 16:45:12 GMT <AFaust> Luckily, Alfresco has never properly documented on how this can / should be used at all, so I guess about nobody (or only an extremely limited number of people) use it at all...

2019-11-18 16:45:39 GMT <angelborroy> I don’t think there is a real use case of ActiveMQ in prod

2019-11-18 16:45:42 GMT <angelborroy> But I can be wrong

2019-11-18 16:46:28 GMT <AFaust> Just seems crazy, that a main "new" component in Alfresco 6 is rarely used, and even only used internally for something relevant starting with Enterprise 6.1 transform service

2019-11-18 16:47:32 GMT <AFaust> I am working on updating some of my training slides, and adding some "history" backstory to transform service, and was stumbling again over Active MQ and how I generally ignore it

2019-11-18 16:47:54 GMT <AFaust> Well... likely going to put some of my passive-aggressive commentary in the slide...

2019-11-18 16:47:54 GMT <angelborroy> I can talk some day with Dave Caruana

2019-11-18 16:48:12 GMT <angelborroy> He’s working on putting all this stuff together up in the right way

2019-11-18 16:48:29 GMT <angelborroy> But not sure on how is progressing with that

2019-11-18 16:51:53 GMT <AFaust> Do you know what numbered attempt the current Desktop Sync is?

2019-11-18 16:52:07 GMT <AFaust> I mean, as in "completely redone implementation"... 3rd or 4th?

2019-11-18 16:52:38 GMT <angelborroy> Not really

2019-11-18 16:52:47 GMT <AFaust> I am thinking 3rd, but am actually not sure... might have missed on of those botched attempts

2019-11-18 16:53:28 GMT <angelborroy> Alfresco Desktop Sync version 1.3, which works with any version of Alfresco since 5.1.4

2019-11-18 16:53:31 GMT <angelborroy> This is the latest

2019-11-18 16:54:17 GMT <angelborroy> https://docs.alfresco.com/desktopsync/references/whats-new.html

2019-11-18 16:54:18 GMT <alfbot> Title:What's new in Desktop Sync | Alfresco Documentation (at docs.alfresco.com)

2019-11-18 16:55:01 GMT <AFaust> Well... just because the product name hasn't changed and the version number has been increased, does not mean it has been the same product ever since...

2019-11-18 16:55:21 GMT <angelborroy> You’re right

2019-11-18 16:55:26 GMT <AFaust> I am currently looking at https://docs.alfresco.com/syncservice/concepts/desktop-sync-overview.html

2019-11-18 16:55:27 GMT <alfbot> Title:Desktop Sync overview | Alfresco Documentation (at docs.alfresco.com)

2019-11-18 16:55:42 GMT <AFaust> Which is already something different for the "newer" versions since 6.0/6.1 or sor

2019-11-18 16:56:10 GMT <angelborroy> Yes, this is the “cloudification” of Desktop Sync

2019-11-18 16:56:49 GMT <AFaust> Right - I count that as separate, so AFAIK Desktop Sync is 2nd or 3rd attempt, Sync Service is 3rd or 4th.

2019-11-18 16:57:24 GMT <AFaust> I just don't remember if there have been one or two other attempts (not necessarily production ready - also counting abandoned / cancelled development announced at partner events / conferences) before that

2019-11-18 16:57:49 GMT <angelborroy> I think there were 2

2019-11-18 16:58:00 GMT <angelborroy> But not sure if I can find references for them

2019-11-18 16:58:22 GMT <angelborroy> Anyway you can always use CmisSync :D

2019-11-18 16:58:55 GMT <AFaust> So there definitely was a CMIS-based Java client / agent which I have tried once.

2019-11-18 16:59:04 GMT <AFaust> Nah - I am not planning on using any of this.

2019-11-18 16:59:21 GMT <AFaust> Just want to be correct in my passive aggressive slide commentary...

2019-11-18 16:59:41 GMT <angelborroy> You’re great :)

2019-11-18 17:19:35 GMT <hi-ko> angelborroy: messaging is just an async communication but not the solution for all these use cases like solr indexing, sync, reporting. same does apply for kafka

2019-11-18 17:20:11 GMT <angelborroy> I know, but at least Kafka provides persistence for the messages

2019-11-18 17:20:54 GMT <hi-ko> you'll need a concept of a system change number. you could implement this in kafka by custom db schema but it would never be performant by using kafka out of the box

2019-11-18 17:21:03 GMT <hi-ko> by reading the message logs

2019-11-18 17:23:59 GMT <hi-ko> at the end of the day you only need 2 tables with a system change number all processes could rely on (index, sync, reporting): node_facts, node_changes

2019-11-18 17:27:12 GMT <hi-ko> the use cases don't want to replay the messages stored by kafka but only to get the relevant (last) changes since last seen SCN

2019-11-18 17:35:22 GMT <angelborroy> That can be an option for million of nodes…

2019-11-18 17:35:33 GMT <angelborroy> … but databases still doesn’t support billions of documents

2019-11-18 17:39:50 GMT <hi-ko> why not??

2019-11-18 17:40:40 GMT <hi-ko> it is used many years for dwh scenarios

2019-11-18 17:44:00 GMT <hi-ko> the default kafka aproach stores low level message without knowledge about the message and you have to reread / interpretate all the messages

2019-11-18 17:46:39 GMT <hi-ko> the db apraoch (which is also recommended by the kafka team) reads the the low level message to store the data in a higher level schema (e.g. db) optimized for the use cases / client requests.

2019-11-18 17:49:14 GMT <hi-ko> don't repeat the lessons learned seen from the IoT use cases - try to learn from them instead ...

End of Daily Log

The other logs are at http://esplins.org/hash_alfresco