# hillview **Repository Path**: mirrors_vmware/hillview ## Basic Information - **Project Name**: hillview - **Description**: Big data spreadsheet - **Primary Language**: Unknown - **License**: Apache-2.0 - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2020-08-19 - **Last Updated**: 2026-04-04 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README *This project has been archived in August 2022. The code should run fine, but no more updates are planned.* ![Hillview project logo](hillview-logo.png) Hillview: a big data spreadsheet. Hillview is a cloud-based service for visualizing interactively large datasets. The hillview user interface executes in a browser. Contents: [1. Documentation](#1-documentation) [2. Local installation](#2-installing-and-running-hillview-on-a-local-machine) [3. Cluster installation](#3-deploying-the-hillview-service-on-a-cluster) [4. Developing Hillview](#4-developing-hillview) # 1. Documentation There is a [Hillview user manual](docs/userManual.md). A [short video](https://1drv.ms/v/s!AlywK8G1COQ_jeRQatBqla3tvgk4FQ) shows the system in action in real-time. You can [try a demo](http://ec2-18-217-136-170.us-east-2.compute.amazonaws.com:8080/) of the system running on 15 small Amazon machines. (the demo will stop working eventually) A [paper](https://arxiv.org/abs/1907.04827) describing the system in some detail. This is an extended version of the following publication Mihai Budiu, Parikshit Gopalan, Lalith Suresh, Udi Wieder, Han Kruiger, and Marcos K. Aguilera, Hillview: A trillion-cell spreadsheet for big data, in PVLDB 2019, 12(11). Documentation for the [internal APIs](docs/hillview-apis.pdf). Experimental use of Hillview using [differential privacy](privacy.md). # 2. Installing and running Hillview on a local machine ## 2.1 Linux of MacOS ### 2.1.1 Installing on Linux or MacOS This will install pre-built binaries. * Install Java 8. At this point newer versions of Java will *not* work. * clone this github repository * run the script `bin/install.sh` ### 2.1.2 Running on Ubuntu or MacOS machines All the following scripts are in the `bin` folder. ``` $ cd bin ``` * Start the back-end service which performs all the data processing: ``` $ ./backend-start.sh & ``` * Start the web server (the default port of the web server is 8080; if you want to change it, change the setting in `apache-tomcat-9.0.4/conf/server.xml`). ``` $ ./frontend-start.sh ``` * start a web browser and open http://localhost:8080 * when you are done stop the two services by killing the `frontend-start.sh` and `backend-start.sh` jobs. ## 2.2 Windows ### 2.2.1 Installing on Windows * Download and install Java 8. * Choose a directory for installing Hillview * Enable execution of powershell scripts; this can be done, for example, by running the following command in powershell as an administrator: `Set-ExecutionPolicy unrestricted` * Download and install the [following script](bin/install-hillview.ps1) in the chosen directory * Run the installation script using Windows powershell: ``` > install-hillview.ps1 ``` ### 2.2.2 Running on Windows All Windows scripts are in the `bin` folder: ``` > cd bin ``` * Start Hillview processes: ``` > hillview-start.bat ``` * If needed give permissions to the application to communicate through the Windows firewall * To stop hillview: ``` > hillview-stop.bat ``` # 3. Deploying the Hillview service on a cluster Hillview uses `ssh` to deploy code on the cluster. Prior to deployment you must setup `ssh` on the cluster to use password-less access to the cluster machines, as described here: https://www.ssh.com/ssh/copy-id. You must also install Java on all machines in the cluster. Each machine in the cluster must allow connections on the network ports described in the [configuration file](#service-configuration). *Please note that Hillview allows arbitrary access to files on the worker nodes from the client application running with the privileges of the user specified in the configuration file.* ## 3.1 Service configuration The configuration of the Hillview service is described in a Json file (enhanced with comments); two sample files are `bin/config.json`and `bin/config-local.json`. The file `config-local.json` treats the local machine as a one-machine cluster. ``` // This file is a Json file that defines the configuration for a // Hillview deployment. { // Name of machine hosting the web server "webserver": "web.server.name", // Names of the machines hosting the workers; the web // server machine can also act as a worker "aggregators": [ // The "aggregators" level is optional; if it is // missing, the configuration should contain just an array of workers { "name": "aggregator1.name", "workers": [ "worker1.name", "worker2.name" ] }, { "name": "aggregator2.name", "workers": [ "worker3.name", "worker4.name" ] } ], // Network port where the workers listen for requests "worker_port": 3569, // Network port where aggregators listen for requests "aggregator_port": 3570, // Java heap size for Hillview workers "default_heap_size": "25G", // User account for running the Hillview service, default is current user "user": "hillview", // Folder where the hillview service is installed on remote machines "service_folder": "/home/hillview", // Version of Apache Tomcat to deploy "tomcat_version": "9.0.4", // Tomcat installation folder name "tomcat": "apache-tomcat-9.0.4", // If true delete old log files, default is false "cleanup": false, // This can be used to override the default_heap_size for specific machines. "workers_heapsize": { "worker1.name": "20G" } } ``` ## 3.2 Deployment scripts First install Hillview locally: ``` $ bin/install.sh ``` Edit the config.json file as described above. All deployment scripts are written in Python, and are in the `bin` folder. ``` $ cd bin ``` Install the software on all cluster machines: ``` $ ./deploy.py config.json ``` Start the Hillview services: ``` $ ./start.py config.json ``` To connect to the service open `http://:8080` in your web browser. Stop the services: ``` $ ./stop.py config.json ``` Query the status of the services: ``` $ ./status.py config.json ``` ## 3.3. Data management We provide some crude data management scripts and tools for clusters. They are described [here](bin/README.md). # 4. Developing Hillview We only provide development instructions for Linux or MacOS, but there is no reason Hillview could not be developed on Windows. ## 4.1. Software Dependencies * Back-end: Ubuntu Linux > 16 or MacOS. On MacOS you first need to install [Homebrew](https://brew.sh/). One way to do that is to run ``` $ /usr/bin/ruby -e "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/master/install)" ``` * Java 8, Maven build system, various Java libraries (Maven will manage the libraries) * Front-end: Typescript, webpack, Tomcat app server, node.js; some JavaScript libraries: d3, pako, and rx-js * Cloud service management: Python3 * Once you have Java everything else is installed by scripts. ### 4.1.1 Installing Java We use Java 8; newer versions will *not* work. First, download a JDK (for Linux x64 or MacOS) from here: http://www.oracle.com/technetwork/java/javase/downloads/jdk8-downloads-2133151.html Note: it is not enough to have a Java VM installed, you need a JDK. Make sure to download the tarball version of the JDK. For Linux: Unpack the JDK, and set your `JAVA_HOME` environment variable to point to the unpacked folder (e.g, /jdk/jdk1.8.0_101). To set your JAVA_HOME environment variable, add the following to your ~/.bashrc or ~/.zshrc. ``` $ export JAVA_HOME="" ``` (For MacOS you do not need to set up JAVA_HOME.) ### 4.1.2. Installing other software needed The following shell script will install the other required dependencies for building and testing. ``` $ cd bin $ ./install-dependencies.sh ``` For old versions of Ubuntu this may fail, so you may have to install the required dependencies manually. #### 4.1.2.1 Optional Impala Java libraries If you want to access the [Impala](https://impala.apache.org/) database you will need to download and install the JDBC connectors for Impala libraries from [Cloudera](https://www.cloudera.com/documentation/other/connectors.html). (These are not free software, so they are not available in Java Maven repositories.) You should install these in your local Maven repository, e.g. in the ~/.m2/com/cloudera/impala folder. You may also need to adjust the version of the libraries in the file `platform/pom.xml`. ## 4.2. Building Hillview * Build the software for the first time: ``` $ cd bin $ ./rebuild.sh -a $ ./demo-data-cleaner.sh ``` Subsequent builds can just run ``` $ bin/rebuild.sh ``` Hillview is currently split into two separate Maven projects. One can also build the two projects separately, as follows: * platform: pure Java, includes the entire back-end. This produces a JAR file `platform/target/hillview-jar-with-dependencies.jar`. This part can be built with: ``` $ cd platform $ mvn clean install $ cd .. ``` * web: the web server, web client and web services; this project links to the result produced by the `platform` project. This produces a WAR (web archive) file `web/target/web-1.0-SNAPSHOT.war`. This part can be built with: ``` $ cd web $ mvn package $ cd .. ``` ## 4.3. Contributing code You will need to sign a CLA (Contributor License Agreement) to contribute code to Hillview under an Apache-2 license. This is very standard. ## 4.4. Setup IntelliJ IDEA Download and install Intellij IDEA: https://www.jetbrains.com/idea/. The web project typescript requires the (paid) Ultimate version of Intellij. First run maven to generate the Java code automatically generated for gRPC: ``` $ cd platform $ mvn install ``` Create an empty project in the hillview folder, and then import three modules (from File/Project structure/Modules, add three modules: web/pom.xml, platform/pom.xml, and the root folder hillview itself). ## 4.5. Setup VS Code Download and install Visual Studio Code: https://code.visualstudio.com/download. Here is a step-by-step guide to add the necessary extensions, run Maven commands, and attach a debugger: 1. Install these extensions and then restart the VS Code. - `Java Extension Pack`: installs 6 important Java extensions at once. - `JavaScript and TypeScript Nightly`: enables JavaScript and TypeScript IntelliSense. - `Language Support for Java(TM) by Red Hat redhat.java`: recognize projects with Maven or Gradle build in the directory hierarchy. - `Maven for Java`: provides a project explorer and shortcuts to execute Maven commands. 2. Select `Add workspace folder...` at the Welcome page, then choose `hillview/platform/` directory. The platform module should be displayed in the `Explorer` view. 3. Add `web` module to the workspace by clicking `File`->`Add Folder to Workspace...` and then choose `hillview/web/` directory. 4. Save the workspace by clicking `File`->`Save Workspace As...` and store it in your personal folder outside `hillview/` root directory. 5. Next, about executing Maven commands; in the `Explorer` view, click `MAVEN PROJECTS`. There are two Maven folders correspond to `web` and `platform` modules; click those folders to expand and display the Maven pom files. The Maven commands will be displayed by right clicking the pom files. 6. Finally, about attaching a debugger: - Bring up the `Run` view, select the `Run` icon in the `Activity Bar` on the left side of VS Code. - From the `Run` view, click `create a launch.json file`, you will see the `platform` and `web` modules listed. We will create two `launch.json` files, one for `platform` module and the other for `web` module. - When configuring the `launch.json` for `platform` module, you must select `Java` option. Otherwise, choose `Chrome (preview)` option when configuring the `web` module. Then, delete the auto generated `configurations` and specify the correct configuration to attach the debugger. The important fields are `url`, `hostname`, `port`, and `request`. More about this is here [VS Code Debugging#launch-configuration](https://code.visualstudio.com/docs/editor/debugging#_launch-configurations) and [VS Code#Java-Debugging](https://code.visualstudio.com/docs/java/java-debugging#_attach). ## 4.6 Debugging Debugging on a single machine can done as follows: - you can start the back-end service under the debugger, by starting the HillviewBackend binary with command-line arguments 127.0.0.1:3569 - you can start the front-end service by attaching to the Java process created by Java Tomcat. The frontend-start.sh script has a line that sets up the environment variables to enable this. ## 4.7. Running the tests * The unit tests are run by building with maven or by running `bin/rebuild.sh -t`. * The UI tests are run by starting Hillview on a local machine and then clicking the "Test/Run" menu button. ## 4.8. Using git to contribute Fork the repository using the "fork" button on github, by following these instructions: https://help.github.com/articles/fork-a-repo/ Here is a step-by-step guide to submitting contributions: 1. Create a new branch for each fix; give it a nice suggestive name: - `git branch yourBranchName` - `git checkout yourBranchName` 2. `git add ` 3. `git commit -m "Description of commit"` 4. `git fetch upstream` 5. `git rebase upstream/master` 6. Resolve conflicts, if any (rebase won't work if you don't; as you find conflicts you will need to `git add` the files you have merged, and then you may need to use `git rebase --continue` or `git rebase --skip`) 7. Test, analyze merged version. 8. `git push -f origin yourBranchName`. 9. Create a pull request to merge your new branch into master (using the web ui). 10. Delete your branch after the merging has been done `git branch -D yourBranchName` ## 4.9. Guidance in writing code * Use the IntelliJ code inspection feature (Analyze/Inspect code). * The pseudorandom generator is implemented in the class [Randomness.java](platform/src/main/java/org/hillview/utils/Randomness.java) and uses Mersenne Twister. Do not use the Java `Random`. * By default all pointers are assumed to be non-null; use the `@Nullable` annotation (from javax.annotation) for all pointers which can be null. Use `Converters.checkNull` to cast a @Nullable pointer to a non-null pointer. * Some code executes on multiple machines or in multiple threads. In particular, all classes that derive from `IMap` or `ISketch` should be immutable.