aboutsummaryrefslogtreecommitdiff
path: root/wordcount/README.md
diff options
context:
space:
mode:
Diffstat (limited to 'wordcount/README.md')
-rw-r--r--wordcount/README.md88
1 files changed, 88 insertions, 0 deletions
diff --git a/wordcount/README.md b/wordcount/README.md
new file mode 100644
index 0000000..3cd7765
--- /dev/null
+++ b/wordcount/README.md
@@ -0,0 +1,88 @@
+---
+services: hdinsight
+platforms: java,python
+author: blackmist
+---
+# hdinsight-python-storm-wordcount
+
+How to use Python components in an Apache Storm topology on HDInsight
+
+This topology uses the Flux framework to define a Storm topology using YAML. The components (spout and bolts) that process the data are written in Python.
+
+This example has been tested with HDInsight 3.6 (Storm 1.1.0).
+
+## Prerequisites
+
+* Python 2.7 or higher
+
+* Java JDK 1.8 or higher
+
+* Maven
+
+* (Optional) A local Storm development environment. This is only needed if you want to run the topology locally. For more information, see [Setting up a development environment](http://storm.apache.org/releases/1.0.1/Setting-up-development-environment.html).
+
+## How it works
+
+* `/resources/topology.yaml` - defines what components are in the topology and how data flows between them.
+
+* `/multilang/resources` - contains the Python components.
+
+* `/pom.xml` - dependencies and how to build the project.
+
+## Build the project
+
+From the root of the project, use the following command:
+
+```bash
+mvn clean compile package
+```
+
+This command creates a `target/WordCount-1.0-SNAPSHOT.jar` file.
+
+## Run the topology locally
+
+To run the topology locally, use the following command:
+
+```bash
+storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -l -R /topology.yaml
+```
+
+Once the topology starts, it emits information to the local console similar to the following text:
+
+```
+24302 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon
+24302 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting the
+24302 [Thread-28] INFO o.a.s.t.ShellBolt - ShellLog pid:2437, name:counter-bolt Emitting years:160
+24302 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=the, count=599}
+24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=seven, count=302}
+24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=dwarfs, count=143}
+24303 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon
+24303 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting cow
+^C24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=four, count=160}
+```
+
+Use Ctrl+c to stop the topology.
+
+## Run the topology on HDInsight
+
+1. Use the following command to copy the `WordCount-1.0-SNAPSHOT.jar` file to your Storm on HDInsight cluster:
+
+ ```bash
+ scp target\WordCount-1.0-SNAPSHOT.jar sshuser@mycluster-ssh.azurehdinsight.net
+ ```
+
+ Replace `sshuser` with the SSH user for your cluster. Replace `mycluster` with the cluster name.
+
+2. Once the file has been uploaded, connect to the cluster using SSH and use the following command to start the topology on the cluster:
+
+ ```bash
+ storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml
+ ```
+
+3. You can use the Storm UI to view the topology on the cluster. The Storm UI is located at https://mycluster.azurehdinsight.net/stormui. Replace `mycluster` with your cluster name.
+
+Once started, a Storm topology runs until stopped (killed.) To stop the topology, use either the `storm kill TOPOLOGYNAME` command from the command-line (SSH session to a Linux cluster,) or by using the Storm UI, select the topology, and then select the __Kill__ button.
+
+## Project code of conduct
+
+This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments.