aboutsummaryrefslogtreecommitdiff

Dockerized Apache Storm

To create a jar and submit to storm:

make enter
cd wordcount
mvn clean compile package
exit

make enter_nimbus
cd /usr/src/app/wordcount/target
storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml

The backend and nimbus containers both mount the local directory to access files. The sample wordcount is just Azure-Samples/hdinsight-python-storm-wordcount with storm.version updated in pom.xml. The project is set up to give you a full jar that you can submit to storm.

Experimenting with streamparse

I also experimented with creating a streamparse example but the tricky part is that streamparse won't generate a jar containing the topology definition because it submits the topology information via thrift. This means I'd need to set up inter-container communication so I just decided to use the above solution to generate a jar. If you want to use streamparse there are a few little notes below to help you get around some issues that I ran into.

How did I get here?

Setting up a sample project with Apache Storm using streamparse. The wordcount project was created with sparse quickstart wordcount but it required some modifications to get running.

https://github.com/Parsely/streamparse/issues/479

I just forked the repo and make a quick update so that I could just permanently run the local mode for now.

Upating dependancies

  :dependencies  [[org.apache.storm/storm-core "2.3.0"]
                  [org.apache.storm/flux-core "2.3.0"]]

Updated these versions to match my local storm version

https://github.com/Parsely/streamparse/issues/472

(require 'cemerick.pomegranate.aether)
(cemerick.pomegranate.aether/register-wagon-factory!
 "http" #(org.apache.maven.wagon.providers.http.HttpWagon.))

Adding the snippet above to my project.clj seemed to fix this issue