From e3d2ffca585660f0c088ab8323bfe78a86ba75e3 Mon Sep 17 00:00:00 2001 From: Cody Hiar Date: Mon, 25 Oct 2021 14:34:37 -0600 Subject: Save working version of deploying jars --- wordcount/README.md | 88 +++++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 88 insertions(+) create mode 100644 wordcount/README.md (limited to 'wordcount/README.md') diff --git a/wordcount/README.md b/wordcount/README.md new file mode 100644 index 0000000..3cd7765 --- /dev/null +++ b/wordcount/README.md @@ -0,0 +1,88 @@ +--- +services: hdinsight +platforms: java,python +author: blackmist +--- +# hdinsight-python-storm-wordcount + +How to use Python components in an Apache Storm topology on HDInsight + +This topology uses the Flux framework to define a Storm topology using YAML. The components (spout and bolts) that process the data are written in Python. + +This example has been tested with HDInsight 3.6 (Storm 1.1.0). + +## Prerequisites + +* Python 2.7 or higher + +* Java JDK 1.8 or higher + +* Maven + +* (Optional) A local Storm development environment. This is only needed if you want to run the topology locally. For more information, see [Setting up a development environment](http://storm.apache.org/releases/1.0.1/Setting-up-development-environment.html). + +## How it works + +* `/resources/topology.yaml` - defines what components are in the topology and how data flows between them. + +* `/multilang/resources` - contains the Python components. + +* `/pom.xml` - dependencies and how to build the project. + +## Build the project + +From the root of the project, use the following command: + +```bash +mvn clean compile package +``` + +This command creates a `target/WordCount-1.0-SNAPSHOT.jar` file. + +## Run the topology locally + +To run the topology locally, use the following command: + +```bash +storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -l -R /topology.yaml +``` + +Once the topology starts, it emits information to the local console similar to the following text: + +``` +24302 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon +24302 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting the +24302 [Thread-28] INFO o.a.s.t.ShellBolt - ShellLog pid:2437, name:counter-bolt Emitting years:160 +24302 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=the, count=599} +24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=seven, count=302} +24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=dwarfs, count=143} +24303 [Thread-25-sentence-spout-executor[4 4]] INFO o.a.s.s.ShellSpout - ShellLog pid:2436, name:sentence-spout Emiting the cow jumped over the moon +24303 [Thread-30] INFO o.a.s.t.ShellBolt - ShellLog pid:2438, name:splitter-bolt Emitting cow +^C24303 [Thread-17-log-executor[3 3]] INFO o.a.s.f.w.b.LogInfoBolt - {word=four, count=160} +``` + +Use Ctrl+c to stop the topology. + +## Run the topology on HDInsight + +1. Use the following command to copy the `WordCount-1.0-SNAPSHOT.jar` file to your Storm on HDInsight cluster: + + ```bash + scp target\WordCount-1.0-SNAPSHOT.jar sshuser@mycluster-ssh.azurehdinsight.net + ``` + + Replace `sshuser` with the SSH user for your cluster. Replace `mycluster` with the cluster name. + +2. Once the file has been uploaded, connect to the cluster using SSH and use the following command to start the topology on the cluster: + + ```bash + storm jar WordCount-1.0-SNAPSHOT.jar org.apache.storm.flux.Flux -r -R /topology.yaml + ``` + +3. You can use the Storm UI to view the topology on the cluster. The Storm UI is located at https://mycluster.azurehdinsight.net/stormui. Replace `mycluster` with your cluster name. + +Once started, a Storm topology runs until stopped (killed.) To stop the topology, use either the `storm kill TOPOLOGYNAME` command from the command-line (SSH session to a Linux cluster,) or by using the Storm UI, select the topology, and then select the __Kill__ button. + +## Project code of conduct + +This project has adopted the [Microsoft Open Source Code of Conduct](https://opensource.microsoft.com/codeofconduct/). For more information see the [Code of Conduct FAQ](https://opensource.microsoft.com/codeofconduct/faq/) or contact [opencode@microsoft.com](mailto:opencode@microsoft.com) with any additional questions or comments. -- cgit v1.2.3