# hashmark **Repository Path**: gavinkou/hashmark ## Basic Information - **Project Name**: hashmark - **Description**: MySQL time-series database and PHP library for data point insertion and analytic queries forked from https://github.com/codeactual/hashmark - **Primary Language**: PHP - **License**: MIT - **Default Branch**: master - **Homepage**: None - **GVP Project**: No ## Statistics - **Stars**: 0 - **Forks**: 0 - **Created**: 2021-05-17 - **Last Updated**: 2022-05-28 ## Categories & Tags **Categories**: Uncategorized **Tags**: None ## README # hashmark hashmark is a MySQL [time-series](http://en.wikipedia.org/wiki/Time_series) database and PHP library for data point insertion and analytic queries. ## Features * Numeric and string data types. * PHP client library for collecting data points in preexisting apps. * Custom scripts for analysis and periodic data point collection. * SQL macros allowing queries to reference intermediate results from prior statements. * Configurable date-based partitioning. * Cache and database adapters provided by bundled Zend Framework 1.x components. * High unit test coverage. ## Analytics ### Support * MySQL aggregate functions: `AVG`, `SUM`, `COUNT`, `MAX`, `MIN`, `STDDEV_POP`, `STDDEV_SAMP`, `VAR_POP`, `VAR_SAMP` * MySQL aggregate functions eligible for DISTINCT selection: `AVG`,`'SUM`, `COUNT`, `MAX`, `MIN` * Time intervals for aggregates: hour, day, week, month, year * MySQL time functions for aggregates of recurrence groups (e.g. "1st of the month"): `HOUR`, `DAYOFMONTH`, `DAYOFYEAR`, `MONTH` ### Methods [multiQuery](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L205)($scalarId, $start, $end, $stmts) > Perform multiple queries using macros to reference prior intermediate result sets. Internally supports many of the functions below. [values](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L351)($scalarId, $limit, $start, $end) > Return samples within a date range. [valuesAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L379)($scalarId, $limit, $start, $end, $interval) > Return the most recent sample from each interval within a date range. [valuesAgg](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L415)($scalarId, $start, $end, $aggFunc, $distinct) > E.g. return **"average value between date X and Y" or **"volume of distinct values between date X and Y."** [valuesAggAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L453)($scalarId, $start, $end, $interval, $aggFunc, $distinct) > Similar to `valuesAgg` except that results are grouped into a given interval, e.g. **"average weekly value between date X and Y."** [valuesNestedAggAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L499)($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner) > Aggregate values returned by `valuesAggAtInterval`, e.g. **"average weekly high between date X and Y."** [valuesAggAtRecurrence](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L557)($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct) > E.g. **"peak value in the 8-9am hour between date X and Y."** [changes](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L601)($scalarId, $limit, $start, $end) > Return from a date range each sample's date, value, and change in value from the prior sample. [changesAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L628)($scalarId, $limit, $start, $end, $interval) > Similar to `changes` except that `valuesAtInterval` provides the source data, e.g. **"weekly value and its change (week-over-week) between date X and Y."** [changesAgg](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L674)($scalarId, $start, $end, $aggFunc, $distinct) > E.g. **"peak value change between date X and Y."** [changesAggAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L712)($scalarId, $start, $end, $interval, $aggFunc, $distinct) > Similar to `changesAgg` except that `changes` provides the source data, e.g. **"weekly peak value change (week-over-week) between date X and Y."** [changesNestedAggAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L766)($scalarId, $start, $end, $interval, $aggFuncOuter, $distinctOuter, $aggFuncInner, $distinctInner) > Aggregate values returned by `changesAggAtInterval`, e.g. **"average of weekly peak value changes (week-over-week) between date X and Y."** [changesAggAtRecurrence](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L831)($scalarId, $start, $end, $recurFunc, $aggFunc, $distinct) > E.g. **"peak value change on Black Friday between year X and year Y."** [frequency](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L874)($scalarId, $limit, $start, $end, $descOrder) > Return unique values and their frequency between date X and Y. [moving](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L903)($scalarId, $limit, $start, $end, $aggFunc, $distinct) > Return from a date range each sample's date, value, and the aggregate value at sample-time. E.g. **"values and their moving averages between date X and Y."** [movingAtInterval](https://github.com/codeactual/hashmark/blob/90bcc5083d2c326b167392b8fd8427e36803fc92/Analyst/BasicDecimal.php#L946)($scalarId, $limit, $start, $end, $interval, $aggFunc, $distinct) > Similar to `valuesAtInterval` except that `moving` provides the data source, e.g. **"the last value and its moving average from each week between date X and Y."** ## Example Code ### Quick Background Main database tables: * [scalars](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L181): Metadata and current value of a named string or number, e.g. "featureX:optOut". * [samples_decimal](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L147): Historical values of a numeric data points in `scalars`. * [samples_string](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L131): Historical values of a string data points in `scalars`. ### Client [Hashmark_Client](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Client.php#L23) supplies methods for updating a current value (in `scalars`) and adding a historical sample (in `samples_decimal` or `samples_string`). * [incr](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Client.php#L126)($name, $amount = 1, $newSample = false) * [decr](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Client.php#L219)($name, $amount = 1, $newSample = false) * [set](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Client.php#L42)($name, $amount, $newSample = false) * [get](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Client.php#L90)($name) ``` php incr('featureX:optOut', 1, true); } ``` To enable drop-in client calls to work without any prior setup, e.g. if "featureX:optOut" above did not yet exist, use `$client->createScalarIfNotExists(true)`. ### Agent Each script is just a class that implements the small [Hashmark_Agent](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Agent.php#L21) interface. The [Agent/StockPrice.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Agent/StockPrice.php#L23) demo fetches AAPL's price from Google Finance and creates a historical data point. [Cron/runAgents.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Cron/runAgents.php) normally runs each agent on a configured schedule, but a manual run might look like: ``` php run($scalarId); $partition = Hashmark::getModule(Partition, '', $db); $partition->createSample($scalarId, $price, time()); ``` ### Create a Scalar ``` php createScalar($scalarFields); $savedScalarFields = $core->getScalarById($scalarId); $savedScalarFields = $core->getScalarByName('featureX:optOut'); ``` ### Create a Category ``` php createCategory('Feature Trackers'); if (!$core->scalarHasCategory($scalarId, $categoryId)) { $core->addScalarCategory($scalarId, $categoryId); } ``` ### Create a Milestone ``` php createMilestone('featureX initial release'); $core->setMilestoneCategory($milestoneId, $releaseCategoryId); ``` ### Query ``` php moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM'); // Now only distinct values affect aggregates $analyst->moving($scalarId, $limit, $sampleDateMin, $sampleDateMax, 'SUM', true); // Returns first 10 samples: their dates and values $analyst->values($scalarId, $limit, $sampleDateMin, $sampleDateMax); // Returns first 10 samples: their dates and values $analyst->values($scalarId, $limit, $sampleDateMin, $sampleDateMax); // Returns first 10 samples: their dates, values, and difference from prior sample $analyst->changes($scalarId, $limit, $sampleDateMin, $sampleDateMax); ``` ## Requirements Most recently tested with PHP 5.4.0beta1, PHPUnit 3.6.0RC4, and MySQL 5.5.16. * PHP 5.2+ * MySQL 5.1+ * PDO or MySQL Improved * apc, xcache or [memcache](http://pecl.php.net/package/memcache) For tests: * PHPUnit 3+ * [bcmath](http://php.net/manual/en/book.bc.php) ## Installation * `CREATE DATABASE hashmark DEFAULT CHARACTER SET utf8 DEFAULT COLLATE utf8_unicode_ci;` * Import [Sql/Schema/hashmark.sql](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L115) * Optionally repeat 1 and 2 for a separate unit test DB. ### Database Configuration Hashmark uses Zend Framework's database component. Refer to the ZF [guide](http://framework.zend.com/manual/1.11/en/zend.db.adapter.html) for option values. Example: ``` php 'Mysqli', 'params' => array( 'host' => '127.0.0.1', 'port' => 5516, 'dbname' => 'hashmark_test', 'username' => 'msandbox', 'password' => 'msandbox' ) ); ``` [Config/Hashmark-dist.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Config/Hashmark-dist.php) only includes a database config profile for cron scripts and unit tests. Normally the client app will supply its own connection instance. For example: ``` php hashmark = Hashmark::getModule('Client', '', $db); ... $this->hashmark->incr('featureX:optOut', 1, true); ``` ### Cache Configuration Hashmark also uses Zend Framework's cache component. Refer to the ZF [guide](http://framework.zend.com/manual/1.11/en/zend.cache.backends.html) for option values. Using Memcache as an example, you might update `$config['cache`'] in [Config/Hashmark-dist.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Config/Hashmark-dist.php#L23): ``` php $config['Cache'] = array( 'backEndName' => 'Memcached', 'frontEndOpts' => array(), 'backEndOpts' => array( 'servers' => array( array('host' => 'localhost', 'port' => 11211) ) ) ); ``` ### Other Configuration See [Config/Hashmark-dist.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Config/Hashmark-dist.php) comments. ### Verify ``` $ php -f Test/Install.php pass: Connected to DB with 'cron' profile in Config/DbHelper.php pass: Found all Hashmark tables with 'cron' profile in Config/DbHelper.php pass: Connected to DB with 'unittest' profile in Config/DbHelper.php pass: Found all Hashmark tables with 'unittest' profile in Config/DbHelper.php pass: Loaded Hashmark_BcMath module. pass: Loaded Hashmark_Cache module. pass: Loaded Hashmark_Client module. pass: Loaded Hashmark_Core module. pass: Loaded Hashmark_DbHelper module. pass: Loaded Hashmark_Partition module. pass: Loaded Hashmark_Agent_YahooWeather module. pass: Loaded Hashmark_Test_FakeModuleType module. pass: Built samples_1234_20111000 partition name with 'm' setting in Config/Partition.php. ``` ## Schema * [agents](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L22): Available [Agent](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Agent/) classes. * [agents_scalars](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L37): Agent's schedules and last-run metadata. * [categories](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L63): Groups to support front-end browsing, searches, visualization, etc. * [categories_milestones](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L79): For example, to link category "ShoppingCart" with milestone "site release 2.1.2". * [categories_scalars](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L97): For example, to link category "ShoppingCart" with data point "featureX:optOut". * [milestones](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L115): Events to correlate with scalar histories, e.g. to visualize "featureX:optOut" changes across site releases that tweak "featureX". * [samples_analyst_temp](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L163): When Hashmark creates temporary tables to hold intermediate aggregates, it copies this table's definition. * [samples_decimal](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L147) and [samples_string](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L131): Identical except for one column. Hashmark copies their definitions when creating new partitions. `id` auto-increment values are seeded from the associated scalar's `sample_count` column. * [scalars](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Sql/Schema/hashmark.sql#L181): The table holds columns that define each data point's type (string or decimal), current value, and other metadata. ## File Layout ### Naming Convention Zend Framework's style is followed pretty closely. Parent classes, some abstract, live in the root directory. Child classes live in directories named after their parents. Class names predictable indicate ancestors, e.g. [Hashmark_Analyst_BasicDecimal`, and file names mirror the class name's last part, e.g. Analyst/BasicDecimal.php. ``` Analyst/ BasicDecimal.php Analyst.php ... Agent/ YahooWeather.php ... Agent.php ... ``` ### Classes * [Analyst.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Analyst.php): Abstract base. For example, implementation [BasicDecimal.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Analyst/BasicDecimal.php) performs list and statistical queries. * [Cache.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Cache.php): Zend_Cache wrapper that adds namespaces. * [Client.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Client.php): Input API for client apps to update scalars and add historical data points. * [Core.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Core.php): Internal API to manage scalars, categories, milestones, etc. * [DbHelper.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/DbHelper.php): Abstract base for Zend_Db adapter wrappers. * [Hashmark.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Hashmark.php): Defines the `getModule()` factory. * [Module.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Module.php): Abstract base for classes produced by factory [Hashmark::getModule()](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Hashmark.php#L76). * [Partition.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Partition.php): Management and querying of MyISAM and MRG_MyISAM tables holding scalars' historical values. * [Agent.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Agent.php): Interface relied upon by [Cron/runAgents.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Cron/runAgents.php). * [Util.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Util.php): Static/stateless helper class with methods like `randomSha1()`. ### Tests Most test-related files live under `Test/`, but a few like `Config/Test.php` live outside so cases can cover code relying on naming conventions. ### Sql/Analyst/ Contains SQL templates. For example, [Sql/Analyst/BasicDecimal.php](https://github.com/codeactual/hashmark/tree/07c4dc972b180418d62bee49ee382d88cf07dc8f/Sql/Analyst/BasicDecimal.php) templates allow [Analyst/BasicDecimal.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Analyst/BasicDecimal.php) to reuse and combine statements as intermediate results toward final aggregates. ## Cron Scripts * [gcMergeTables.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Cron/gcMergeTables.php): Drops merge tables based on hard limits defined in [Config/Cron.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Config/Hashmark-dist.php#L27). * [gcUnitTestTables.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Cron/gcUnitTestTables.php): Drops test-created tables and runs `FLUSH TABLES`. * [runAgents.php](https://github.com/codeactual/hashmark/blob/b24734f75552189b82611cd927e745ebe70ef4b8/Cron/runAgents.php): Finds and runs all agent scripts due for execution based on their configured frequency. ## Tests ### Running **First**: `php -f Test/Analyst/BasicDecimal/Tool/writeProviderData.php` which Test/Analyst/BasicDecimal/Data/provider.php. The `BasicDecimal` suite relies on a `bcmath` and a series of generators in `Test/Analyst/BasicDecimal/Tool/` to provide calculate a comprehensive set of expected test results. * Run suites for all modules: `phpunit [--group name] Test/AllTests.php` * Run a specific module's suite: `phpunit [--group name] Test/[module]/AllTests.php`