aboutsummaryrefslogtreecommitdiff
path: root/README.md
blob: eca2c64cb1fba13866e44baa193cb6db9e4520d7 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
# Dockerized Apache Storm

To create a jar and submit to storm:

```
make enter
cd wordcount
mvn clean compile package
exit

make enter_nimbus
cd /usr/src/app/wordcount/target
storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml
```

The backend and nimbus containers both mount the local directory to access
files. The sample wordcount is just [Azure-Samples/hdinsight-python-storm-wordcount][1]
with `storm.version` updated in pom.xml. The project is set up to give you a
full jar that you can submit to storm.

## Experimenting with streamparse

I also experimented with creating a streamparse example but the tricky part is
that streamparse won't generate a jar containing the topology definition
because it submits the topology information via thrift. This means I'd need to
set up inter-container communication so I just decided to use the above
solution to generate a jar. If you want to use streamparse there are a few
little notes below to help you get around some issues that I ran into.

## How did I get here?

Setting up a sample project with Apache Storm using streamparse. The
`wordcount` project was created with `sparse quickstart wordcount` but it
required some modifications to get running.

## https://github.com/Parsely/streamparse/issues/479

I just forked the repo and make a quick update so that I could just permanently
run the local mode for now.

## Upating dependancies

```
  :dependencies  [[org.apache.storm/storm-core "2.3.0"]
                  [org.apache.storm/flux-core "2.3.0"]]
```
Updated these versions to match my local storm version

## https://github.com/Parsely/streamparse/issues/472

```
(require 'cemerick.pomegranate.aether)
(cemerick.pomegranate.aether/register-wagon-factory!
 "http" #(org.apache.maven.wagon.providers.http.HttpWagon.))
```

Adding the snippet above to my project.clj seemed to fix this issue

[1]: https://github.com/Azure-Samples/hdinsight-python-storm-wordcount