GitPedia
salesforce

salesforce/TransmogrifAI

TransmogrifAI (pronounced trăns-mŏgˈrə-fī) is an AutoML library for building modular, reusable, strongly typed machine learning workflows on Apache Spark with minimal hand-tuning

9 Releases
Latest: 6y ago
0.7.0Latest
nicodvnicodv·6y ago·June 11, 2020
GitHub

📋 Changes

  • Fix flaky `ModelInsight` tests [#407](https://github.com/salesforce/TransmogrifAI/pull/407)
  • Remove logging of tokens of text fields [#420](https://github.com/salesforce/TransmogrifAI/pull/420), [#438](https://github.com/salesforce/TransmogrifAI/pull/438), [#447](https://github.com/salesforce/TransmogrifAI/pull/447), [#474](https://github.com/salesforce/TransmogrifAI/pull/474)
  • Add validation prepare call before model selection when no DAG is passed [#424](https://github.com/salesforce/TransmogrifAI/pull/424), [#429](https://github.com/salesforce/TransmogrifAI/pull/429)
  • Fix `Days.daysBetween` int overflow [#471](https://github.com/salesforce/TransmogrifAI/pull/471)
  • Downsample the number of training samples to `maxTrainingSample` for regression [#413](https://github.com/salesforce/TransmogrifAI/pull/413) and multi-class classification [#414](https://github.com/salesforce/TransmogrifAI/pull/414)
  • Refactor `InsightLOCOTest` [#412](https://github.com/salesforce/TransmogrifAI/pull/412)
  • Enable more loss types for `OpLinearRegression` [#421](https://github.com/salesforce/TransmogrifAI/pull/421)
  • Add property-based tests for regression model selection [#427](https://github.com/salesforce/TransmogrifAI/pull/427)
  • + 23 more
0.6.1
gerashegalovgerashegalov·6y ago·September 12, 2019
GitHub

📋 Changes

  • Ensure correct metrics despite model failures on some CV folds [#404](https://github.com/salesforce/TransmogrifAI/pull/404)
  • Fix flaky `ModelInsight` tests [#395](https://github.com/salesforce/TransmogrifAI/pull/395)
  • Avoid creating `SparseVector`s for LOCO [#377](https://github.com/salesforce/TransmogrifAI/pull/377)
  • Model combiner [#385](https://github.com/salesforce/TransmogrifAI/pull/399)
  • Added new sample for HousingPrices [#365](https://github.com/salesforce/TransmogrifAI/pull/365)
  • Test to verify that custom metrics appear in model insight metrics [#387](https://github.com/salesforce/TransmogrifAI/pull/387)
  • Add `FeatureDistribution` to `SerializationFormat`s [#383](https://github.com/salesforce/TransmogrifAI/pull/383)
  • Add metadata to `OpStandadrdScaler` to allow for descaling [#378](https://github.com/salesforce/TransmogrifAI/pull/378)
  • + 6 more
0.6.0
michaelweilsalesforcemichaelweilsalesforce·6y ago·July 12, 2019
GitHub

📋 Changes

  • Quick Fix Alias Type Names [#346](https://github.com/salesforce/TransmogrifAI/pull/346)
  • Forecast Evaluator - fixes SMAPE, adds MASE and Seasonal Error metrics [#342](https://github.com/salesforce/TransmogrifAI/pull/342)
  • Aggregate LOCOs of DateToUnitCircleTransformer. [#349](https://github.com/salesforce/TransmogrifAI/pull/349)
  • Convert lambda functions into concrete classes to allow compatibility with Scala 2.12 [#357](https://github.com/salesforce/TransmogrifAI/pull/357)
  • Replace mapValues with immutable Map where applicable [#363](https://github.com/salesforce/TransmogrifAI/pull/363)
  • Aggregate spark metrics during run time instead of post processing by default [#358](https://github.com/salesforce/TransmogrifAI/pull/358)
  • Allow customizing serialization for FeatureGenerator extract function [#352](https://github.com/salesforce/TransmogrifAI/pull/352)
  • Update helloworld examples to be simple [#351](https://github.com/salesforce/TransmogrifAI/pull/351)
  • + 24 more
0.5.3
JauntboxJauntbox·7y ago·May 8, 2019
GitHub

📋 Changes

  • Threshold metrics calculation fix when unseen labels are present [#293](https://github.com/salesforce/TransmogrifAI/pull/293)
  • DataCutter-related fixes for multiclass [#263](https://github.com/salesforce/TransmogrifAI/pull/263)
  • Fixed onSetInput so is always called with new input [#280](https://github.com/salesforce/TransmogrifAI/pull/280)
  • Improved test SmartTextMapVectorizerTest [#296](https://github.com/salesforce/TransmogrifAI/pull/296)
  • Add check to raw feature filter for removing all features [#303](https://github.com/salesforce/TransmogrifAI/pull/303)
  • Spec-ifying ngram similarity tests [#299](https://github.com/salesforce/TransmogrifAI/pull/299)
  • Add random test feature generator to generate datasets with features of *all* types [#298](https://github.com/salesforce/TransmogrifAI/pull/298)
  • Spec-ifying NGramTest [#297](https://github.com/salesforce/TransmogrifAI/pull/297)
  • + 13 more
0.5.2
tovbinmtovbinm·7y ago·April 11, 2019
GitHub

📋 Changes

  • Fixed local scoring with multipicklist features [#243](https://github.com/salesforce/TransmogrifAI/pull/243)
  • Fixed error messages in `DataCutter` and `DataBalancer` [#256](https://github.com/salesforce/TransmogrifAI/pull/256)
  • Fixed bug in in model selector fit method [#251](https://github.com/salesforce/TransmogrifAI/pull/251)
  • Fixed some Transmogrifier defaults to be modifiable / exposed [#232](https://github.com/salesforce/TransmogrifAI/pull/232)
  • Fixed bug in `OpXGBoostClassificationModel` [#229](https://github.com/salesforce/TransmogrifAI/pull/229)
  • Minor fixes / cleanup on notebooks, Helloworld examples, and developer guide [#226](https://github.com/salesforce/TransmogrifAI/pull/226), [#230](https://github.com/salesforce/TransmogrifAI/pull/230), [#240](https://github.com/salesforce/TransmogrifAI/pull/240), [#259](https://github.com/salesforce/TransmogrifAI/pull/259)
  • Added transformer classes for common math operations [#255](https://github.com/salesforce/TransmogrifAI/pull/255), [#257](https://github.com/salesforce/TransmogrifAI/pull/257)
  • Added string transformers for substring search and valid email [#265](https://github.com/salesforce/TransmogrifAI/pull/265)
  • + 12 more
0.5.1
JauntboxJauntbox·7y ago·February 9, 2019
GitHub

📋 Changes

  • Fix indices in LOCO for record-level insights and add more robust tests https://github.com/salesforce/TransmogrifAI/pull/216
  • Fix sorting in Prediction type for multiclass classification and add stronger tests https://github.com/salesforce/TransmogrifAI/pull/213
  • Fixing code generation bug with underscores in names https://github.com/salesforce/TransmogrifAI/pull/208
  • Correct some syntax/compilation errors in Titanic Binary Classification Docs Example https://github.com/salesforce/TransmogrifAI/pull/202
  • Make some tests a little less flaky https://github.com/salesforce/TransmogrifAI/pull/221
  • Integrate helloworld project with Travis CI https://github.com/salesforce/TransmogrifAI/pull/210, https://github.com/salesforce/TransmogrifAI/pull/212
  • Use ParamGridBuilder in model selector grids to allow modifications https://github.com/salesforce/TransmogrifAI/pull/206
  • Use class.getName & update splitter meta parsing https://github.com/salesforce/TransmogrifAI/pull/204
  • + 6 more
0.5.0
tovbinmtovbinm·7y ago·November 22, 2018
GitHub

📋 Changes

  • XGBoost classification & regression models - EXPERIMENTAL [#44](https://github.com/salesforce/TransmogrifAI/pull/44)
  • Add default param grid for xgboost [#175](https://github.com/salesforce/TransmogrifAI/pull/175)
  • Fix ModelInsights for xgboost [#170](https://github.com/salesforce/TransmogrifAI/pull/170)
  • Added Parquet reader [#169](https://github.com/salesforce/TransmogrifAI/pull/169)
  • Added aggregate & conditional readers for Parquet [#172](https://github.com/salesforce/TransmogrifAI/pull/172)
  • Evaluators check for empty data [#178](https://github.com/salesforce/TransmogrifAI/pull/178)
  • Refactored splitter tests [#176](https://github.com/salesforce/TransmogrifAI/pull/176)
  • Return scoring feature distributions from RawFeatureFilter [#171](https://github.com/salesforce/TransmogrifAI/pull/171)
  • + 19 more
0.4.0
tovbinmtovbinm·7y ago·September 23, 2018
GitHub

📋 Changes

  • Allow to specify the formula to compute the text features bin size for `RawFeatureFilter` (see `RawFeatureFilter.textBinsFormula` argument) [#99](https://github.com/salesforce/TransmogrifAI/pull/99)
  • Fixed metadata on `Geolocation` and `GeolocationMap` so that keep the name of the column in descriptorValue. [#100](https://github.com/salesforce/TransmogrifAI/pull/100)
  • Local scoring (aka Sparkless) using Aardpfark. This enables loading and scoring models without Spark context but locally using Aardpfark (PFA for Spark) and Hadrian libraries instead. This allows orders of magnitude faster scoring times compared to Spark. [#41](https://github.com/salesforce/TransmogrifAI/pull/41)
  • Add distributions calculated in `RawFeatureFilter` to `ModelInsights` [#103](https://github.com/salesforce/TransmogrifAI/pull/103)
  • Added binary sequence transformer & estimator: `BinarySequenceTransformer` and `BinarySequenceEstimator` + plus the associated base traits [#84](https://github.com/salesforce/TransmogrifAI/pull/84)
  • Added `StringIndexerHandleInvalid.Keep` option into `OpStringIndexer` (same as in underlying Spark estimator) [#93](https://github.com/salesforce/TransmogrifAI/pull/93)
  • Allow numbers and underscores in feature names [#92](https://github.com/salesforce/TransmogrifAI/pull/92)
  • Stable key order for map vectorizers [#88](https://github.com/salesforce/TransmogrifAI/pull/88)
  • + 23 more
0.3.4
tovbinmtovbinm·7y ago·August 22, 2018
GitHub

📋 Changes

  • Added featureLabelCorrOnly parameter in SanityChecker to only compute correlations between features and label (defaults to false)
  • Added ignoreHashCorrelations parameter in SanityChecker that ignores correlations from hashed text features (defaults to false)
  • Parallelize OP cross validation and set default validation parallelism to 8
  • Added warmup in concurrent checks
  • Replace deprecated 'forceSharedHashSpace' param with HashingStrategy
  • Added explicit annotations for all classes with generic collections that use JsonUtils
  • Added .transmogrify shortcut for arrays of features
  • Removed referencing UID from a case object
  • + 10 more