cortexlabs/cortex
Production infrastructure for machine learning at scale
30 Releases
Latest: 3y ago
v0.42.1Latest
📋 Changes
- Add support for new set of EC2 instances amongst which the `c6` and `g5` families can be found https://github.com/cortexlabs/cortex/issues/2414 ([RobertLucian](https://github.com/RobertLucian))
- Esthetic fix where the VPC CNI logging functionality was triggering warn logs when running the `cortex` CLI https://github.com/cortexlabs/cortex/pull/2443 ([RobertLucian](https://github.com/RobertLucian))
- Update Cortex dependency versions; eksctl, EKS to 1.22, AWS IAM, Python, etc https://github.com/cortexlabs/cortex/issues/2414 ([RobertLucian](https://github.com/RobertLucian), [deliahu](https://github.com/deliahu))
v0.42.0
📋 Changes
- Add support for the Classic Load Balancer for APIs; the Network Load Balancer remains the default ([docs](https://docs.cortex.dev/clusters/management/create#cluster.yaml)) https://github.com/cortexlabs/cortex/pull/2413 https://github.com/cortexlabs/cortex/issues/2414 ([RobertLucian](https://github.com/RobertLucian))
- Fix Async API http/tcp probes when probing the empty root path (`/`) https://github.com/cortexlabs/cortex/pull/2407 ([RobertLucian](https://github.com/RobertLucian))
- Fix nil pointer exception in the `cortex cluster export` command https://github.com/cortexlabs/cortex/pull/2415 https://github.com/cortexlabs/cortex/issues/2414 ([RobertLucian](https://github.com/RobertLucian))
- Ensure that user-specified environment variables are ordered deterministically in the Kubernetes deployment spec https://github.com/cortexlabs/cortex/pull/2411 ([deliahu](https://github.com/deliahu))
- Ensure that the batch on-job-complete request contains a valid JSON body https://github.com/cortexlabs/cortex/pull/2409 ([RobertLucian](https://github.com/RobertLucian))
v0.41.0
📋 Changes
- Support configurable `pre_stop` command for containers https://github.com/cortexlabs/cortex/pull/2403 ([docs](https://docs.cortex.dev/workloads/realtime/configuration)) ([deliahu](https://github.com/deliahu))
- Support m6i instance types https://github.com/cortexlabs/cortex/pull/2398 ([deliahu](https://github.com/deliahu))
- Update to Kubernetes v1.21 https://github.com/cortexlabs/cortex/pull/2398 ([deliahu](https://github.com/deliahu))
- Wait for in-flight requests to reach zero before terminating the proxy container https://github.com/cortexlabs/cortex/pull/2402 ([deliahu](https://github.com/deliahu))
- Fix `cortex get --env` command https://github.com/cortexlabs/cortex/pull/2404 ([deliahu](https://github.com/deliahu))
- Fix cluster price estimate during `cortex cluster up` for spot node groups with on-demand base capacity https://github.com/cortexlabs/cortex/pull/2406 ([RobertLucian](https://github.com/RobertLucian))
📦 Nucleus Model Server
- We have released v0.1.0 of the [Nucleus model server](https://github.com/cortexlabs/nucleus)!
- Some of Nucleus's features include:
- Generic Python models (PyTorch, ONNX, Sklearn, MLFlow, Numpy, Pandas, etc)
- TensorFlow models
- CPU and GPU support
- Serve models directly from S3 paths
- Configurable multiprocessing and multithreadding
- Multi-model endpoints
- + 4 more
v0.40.0
📋 Changes
- Support concurrency for Async APIs (via the `max_concurrency` field) https://github.com/cortexlabs/cortex/pull/2376 https://github.com/cortexlabs/cortex/issues/2200 ([miguelvr](https://github.com/miguelvr))
- Add graphs for cluster-wide and per-API cost breakdowns to the cluster metrics dashboard https://github.com/cortexlabs/cortex/pull/2382 https://github.com/cortexlabs/cortex/issues/1962 ([RobertLucian](https://github.com/RobertLucian))
- Allow worker nodes containing Async APIs to scale to zero (now a shared async gateway is used, which runs on the operator node group) https://github.com/cortexlabs/cortex/pull/2380 https://github.com/cortexlabs/cortex/issues/2279 ([vishalbollu](https://github.com/vishalbollu))
- Add `cortex describe API_NAME` command for Realtime and Async APIs https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 ([RobertLucian](https://github.com/RobertLucian))
- Support updating the priority of an existing node group https://github.com/cortexlabs/cortex/pull/2369 https://github.com/cortexlabs/cortex/issues/2254 ([vishalbollu](https://github.com/vishalbollu))
- Improve the reporting of API statuses https://github.com/cortexlabs/cortex/pull/2368 https://github.com/cortexlabs/cortex/issues/2320 https://github.com/cortexlabs/cortex/issues/2359 ([RobertLucian](https://github.com/RobertLucian))
- Remove the default readiness probe on the target port if a custom readiness probe is specified in the API spec https://github.com/cortexlabs/cortex/pull/2379 ([RobertLucian](https://github.com/RobertLucian))
v0.39.1
📋 Changes
- Remove an unnecessary cluster validation which limited the IP ranges that could be used in `api_load_balancer_cidr_white_list` and `operator_load_balancer_cidr_white_list` https://github.com/cortexlabs/cortex/pull/2363 ([RobertLucian](https://github.com/RobertLucian))
v0.39.0
📋 Changes
- Add `cortex cluster health` command to show the health of the cluster's components https://github.com/cortexlabs/cortex/pull/2313 https://github.com/cortexlabs/cortex/issues/2029 ([miguelvr](https://github.com/miguelvr))
- Forward request headers to AsyncAPIs https://github.com/cortexlabs/cortex/pull/2329 https://github.com/cortexlabs/cortex/issues/2296 ([miguelvr](https://github.com/miguelvr))
- Add metrics dashboard for Task APIs https://github.com/cortexlabs/cortex/pull/2311 https://github.com/cortexlabs/cortex/pull/2322 ([RobertLucian](https://github.com/RobertLucian))
- Enable larger cluster sizes (up to 1000 nodes with 10000 pods) by enabling IPVS https://github.com/cortexlabs/cortex/pull/2357 https://github.com/cortexlabs/cortex/issues/1834 ([RobertLucian](https://github.com/RobertLucian))
- Automatically limit the rate at which nodes are added to avoid overloading the Kubernetes API server https://github.com/cortexlabs/cortex/pull/2331 https://github.com/cortexlabs/cortex/pull/2338 https://github.com/cortexlabs/cortex/issues/2314 ([RobertLucian](https://github.com/RobertLucian))
- Ensure cluster autoscaler availability https://github.com/cortexlabs/cortex/pull/2347 https://github.com/cortexlabs/cortex/issues/2346 ([RobertLucian](https://github.com/RobertLucian))
- Improve istiod availability at large scale https://github.com/cortexlabs/cortex/pull/2342 https://github.com/cortexlabs/cortex/issues/2332 ([RobertLucian](https://github.com/RobertLucian))
- Reduce metrics shown in `cortex get` to improve scalability and reliability of the command https://github.com/cortexlabs/cortex/pull/2333 https://github.com/cortexlabs/cortex/issues/2319 ([vishalbollu](https://github.com/vishalbollu))
- + 12 more
v0.38.0
📋 Changes
- Support autoscaling down to zero replicas for Realtime APIs https://github.com/cortexlabs/cortex/pull/2298 https://github.com/cortexlabs/cortex/issues/445 ([miguelvr](https://github.com/miguelvr))
- Allow `ssl_certificate_arn`, `api_load_balancer_cidr_white_list`, and `operator_load_balancer_cidr_white_list` to be updated on an existing cluster (via the `cortex cluster configure` command) https://github.com/cortexlabs/cortex/pull/2305 https://github.com/cortexlabs/cortex/issues/2107 ([vishalbollu](https://github.com/vishalbollu))
- Allow Prometheus's instance type to be configured ([docs](https://docs.cortex.dev/clusters/management/create#cluster-yaml)) https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/issues/2285 ([RobertLucian](https://github.com/RobertLucian))
- Allow multiple Inferentia chips to be assigned to a single container https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/1123 ([deliahu](https://github.com/deliahu))
- Fix cluster autoscaler's nodegroup priority calculation https://github.com/cortexlabs/cortex/pull/2309 ([RobertLucian](https://github.com/RobertLucian))
- Various scalability improvements https://github.com/cortexlabs/cortex/pull/2307 https://github.com/cortexlabs/cortex/pull/2304 https://github.com/cortexlabs/cortex/issues/2297 https://github.com/cortexlabs/cortex/issues/2278 https://github.com/cortexlabs/cortex/issues/2285
- Allow setting a nodegroup's `max_instances` to `0` https://github.com/cortexlabs/cortex/pull/2310 ([RobertLucian](https://github.com/RobertLucian))
v0.37.0
📋 Changes
- Support ARM instance types https://github.com/cortexlabs/cortex/pull/2268 https://github.com/cortexlabs/cortex/issues/1528 ([RobertLucian](https://github.com/RobertLucian))
- Add `cortex cluster configure` command to add, remove, or scale nodegroups on a running cluster https://github.com/cortexlabs/cortex/pull/2246 https://github.com/cortexlabs/cortex/issues/2096 ([RobertLucian](https://github.com/RobertLucian))
- Add `cortex cluster info --print-config` command to print the current configuration of a running cluster https://github.com/cortexlabs/cortex/pull/2246 ([RobertLucian](https://github.com/RobertLucian))
- Add metrics dashboard for Async APIs https://github.com/cortexlabs/cortex/pull/2242 https://github.com/cortexlabs/cortex/issues/1958 ([miguelvr](https://github.com/miguelvr))
- Support `cortex refresh` command for Async APIs https://github.com/cortexlabs/cortex/pull/2265 https://github.com/cortexlabs/cortex/issues/2237 ([deliahu](https://github.com/deliahu))
- The `cortex cluster scale` command has been replaced by the `cortex cluster configure` command.
- Fix Async API metrics reporting for non-200 response status codes https://github.com/cortexlabs/cortex/pull/2266 ([miguelvr](https://github.com/miguelvr))
- Make batch job metrics persistence resilient to instance termination https://github.com/cortexlabs/cortex/pull/2247 https://github.com/cortexlabs/cortex/issues/2041 ([vishalbollu](https://github.com/vishalbollu))
- + 8 more
v0.36.0
📋 Changes
- Support running arbitrary Docker containers in all workload types (Realtime, Async, Batch, Task) https://github.com/cortexlabs/cortex/pull/2173 ([RobertLucian](https://github.com/RobertLucian), [miguelvr](https://github.com/miguelvr), [vishalbollu](https://github.com/vishalbollu), [deliahu](https://github.com/deliahu), [ospillinger](https://github.com/ospillinger))
- Support autoscaling Async APIs to zero replicas https://github.com/cortexlabs/cortex/pull/2224 https://github.com/cortexlabs/cortex/issues/2199 ([RobertLucian](https://github.com/RobertLucian))
- With this release, we have generalized Cortex to exclusively support running arbitrary Docker containers for all workload types (Realtime, Async, Batch, and Task). This enables the use of any model server, programming language, etc. As a result, the API configuration has been updated: the `predictor` section has been removed, the `pod` section has been added, and the `autoscaling` parameters have been modified slightly (depending on the workload type). See updated docs for [Realtime](https://docs.cortex.dev/workloads/realtime), [Async](https://docs.cortex.dev/workloads/async), [Batch](https://docs.cortex.dev/workloads/batch), and [Task](https://docs.cortex.dev/workloads/task). If you'd like to to see examples of Dockerizing Python applications, see our [test/apis](https://github.com/cortexlabs/cortex/tree/0.36/test/apis) folder.
- The `cortex prepare-debug` command has been removed; Cortex now exclusively runs Docker containers, which can be run locally via `docker run`.
- The `cortex patch` command as been removed; its behavior is now identical to `cortex deploy`.
- The `cortex logs` command now prints a CloudWatch Insights URL with a pre-populated query which can be executed to show logs from your workloads, since this is the recommended approach in production. If you wish to stream logs from a pod at random, you can use `cortex logs --random-pod` (keep in mind that these logs will not include some system logs related to your workload).
- gRPC support has been temporarily removed; we are working on adding it back in v0.37.
- Handle exception when initializing the Python client when the default environment is not set https://github.com/cortexlabs/cortex/pull/2225 https://github.com/cortexlabs/cortex/issues/2223 ([deliahu](https://github.com/deliahu))
- + 3 more
v0.35.0
📋 Changes
- Avoid processing HTTP requests that have been cancelled by the client https://github.com/cortexlabs/cortex/pull/2135 https://github.com/cortexlabs/cortex/issues/1453 ([vishalbollu](https://github.com/vishalbollu))
- Support GP3 volumes (and make GP3 the default volume type) https://github.com/cortexlabs/cortex/pull/2130 https://github.com/cortexlabs/cortex/issues/1843 ([RobertLucian](https://github.com/RobertLucian))
- Allow setting the shared memory (shm) size for Task APIs https://github.com/cortexlabs/cortex/pull/2132 https://github.com/cortexlabs/cortex/issues/2115 ([RobertLucian](https://github.com/RobertLucian))
- Implement automatic 7-day expiration for Async API responses https://github.com/cortexlabs/cortex/pull/2151 ([RobertLucian](https://github.com/RobertLucian))
- Add `cortex env rename` command https://github.com/cortexlabs/cortex/pull/2165 https://github.com/cortexlabs/cortex/issues/1773 ([deliahu](https://github.com/deliahu))
- The Python client methods which deploy Python classes have been separated from the `deploy()` method. Now, `deploy()` is used only to deploy project folders, and `deploy_realtime_api()`, `deploy_async_api()`, `deploy_batch_api()`, and `deploy_task_api()` are for deploying Python classes. ([docs](https://docs.cortex.dev/clients/python))
- The name of the bucket that Cortex uses for internal purposes is no longer configurable. During cluster creation, Cortex will auto-generate the bucket name (and create the bucket if it doesn't exist). During cluster deletion, the bucket will be emptied (unless the `--keep-aws-resources` flag is provided to `cortex cluster down`). Users' files should not be stored in the Cortex internal bucket.
- Fix the number of Async API replicas shown in `cortex cluster info` https://github.com/cortexlabs/cortex/pull/2140 https://github.com/cortexlabs/cortex/issues/2129 ([RobertLucian](https://github.com/RobertLucian))
- + 6 more
v0.34.0
📋 Changes
- Support handling `GET`, `PUT`, `PATCH`, and `DELETE` HTTP requests in Realtime APIs ([docs](https://docs.cortex.dev/workloads/realtime-apis/handler#http)) https://github.com/cortexlabs/cortex/pull/2111 https://github.com/cortexlabs/cortex/issues/2063 ([RobertLucian](https://github.com/RobertLucian))
- Support running realtime API containers locally for debugging / development purposes ([docs](https://docs.cortex.dev/workloads/debugging)) https://github.com/cortexlabs/cortex/pull/2112 https://github.com/cortexlabs/cortex/issues/2077 ([vishalbollu](https://github.com/vishalbollu))
- Support multiple gRPC services / methods (which can be named arbitrarily) in a single Realtime API ([docs](https://docs.cortex.dev/workloads/realtime-apis/handler#grpc)) https://github.com/cortexlabs/cortex/pull/2111 https://github.com/cortexlabs/cortex/issues/2063 ([RobertLucian](https://github.com/RobertLucian))
- Support specifying a list of node groups on which a workload is allowed to run (see configuration docs for [Realtime](https://docs.cortex.dev/workloads/realtime-apis/configuration), [Async](https://docs.cortex.dev/workloads/async-apis/configuration), [Batch](https://docs.cortex.dev/workloads/batch-apis/configuration), or [Task](https://docs.cortex.dev/workloads/task-apis/configuration) APIs) https://github.com/cortexlabs/cortex/pull/2098 https://github.com/cortexlabs/cortex/issues/2034 ([RobertLucian](https://github.com/RobertLucian))
- Support AWS GovCloud regions https://github.com/cortexlabs/cortex/pull/2118 https://github.com/cortexlabs/cortex/issues/2103 ([vishalbollu](https://github.com/vishalbollu))
- "predictor" has been renamed to "handler" throughout the product (API configuration and Python APIs). In addition, as a result of supporting additional HTTP method verbs, `predict()` has been renamed to `handle_post()` in Realtime APIs (`handle_get()`, `handle_put()`, `handle_patch()`, and `handle_delete()` are now also supported). For consistency, `predict()` has been renamed to `handle_async()` for Async APIs, and `handle_batch()` for Batch APIs. See the examples for [Realtime](https://docs.cortex.dev/workloads/realtime-apis/example), [Async](https://docs.cortex.dev/workloads/async-apis/example), and [Batch](https://docs.cortex.dev/workloads/batch-apis/example) APIs. Task APIs have not been changed.
- Fix invalid Async workload status during processing https://github.com/cortexlabs/cortex/pull/2106 https://github.com/cortexlabs/cortex/issues/2104 ([RobertLucian](https://github.com/RobertLucian))
- Add docs for [configuring Grafana alerts](https://docs.cortex.dev/clusters/observability/alerting) ([RobertLucian](https://github.com/RobertLucian))
- + 4 more
v0.33.0
📋 Changes
- Allow specifying a CIDR range whitelist for APIs and the operator ([docs](https://docs.cortex.dev/clusters/management/create)) https://github.com/cortexlabs/cortex/pull/2071 https://github.com/cortexlabs/cortex/issues/2003 ([vishalbollu](https://github.com/vishalbollu))
- Enable CORS for async, batch, and task APIs https://github.com/cortexlabs/cortex/pull/2082 https://github.com/cortexlabs/cortex/issues/2073 ([deliahu](https://github.com/deliahu))
- The onnx predictor type has been replaced by the python predictor type; please use the python predictor type instead (all onnx models are fully supported by the python predictor type)
- Fix bug affecting async api consistency during heavy traffic https://github.com/cortexlabs/cortex/pull/2072 ([RobertLucian](https://github.com/RobertLucian))
- Fix bug affecting async api updates https://github.com/cortexlabs/cortex/pull/2067 ([vishalbollu](https://github.com/vishalbollu))
- Rename `cortex cluster configure` command to `cortex cluster scale` https://github.com/cortexlabs/cortex/pull/2040 https://github.com/cortexlabs/cortex/issues/1972 ([RobertLucian](https://github.com/RobertLucian))
- Disable AZRebalance autoscaling group process https://github.com/cortexlabs/cortex/pull/2042 https://github.com/cortexlabs/cortex/issues/1349 ([RobertLucian](https://github.com/RobertLucian))
- Add horizontal pod autoscaler to async API gateway https://github.com/cortexlabs/cortex/pull/2079 https://github.com/cortexlabs/cortex/issues/2078 ([RobertLucian](https://github.com/RobertLucian))
- + 3 more
v0.32.0
📋 Changes
- Add gRPC support to realtime APIs ([docs](https://docs.cortex.dev/workloads/realtime-apis/predictors#grpc)) https://github.com/cortexlabs/cortex/pull/1997 https://github.com/cortexlabs/cortex/issues/1056 ([RobertLucian](https://github.com/RobertLucian))
- Add support for ONNX and TensorFlow predictor types in async APIs ([docs](https://docs.cortex.dev/workloads/async-apis/predictors)) https://github.com/cortexlabs/cortex/pull/1996 https://github.com/cortexlabs/cortex/issues/1980 ([miguelvr](https://github.com/miguelvr))
- Support using ECR images from other AWS accounts and regions https://github.com/cortexlabs/cortex/pull/2011 https://github.com/cortexlabs/cortex/issues/1988 ([vishalbollu](https://github.com/vishalbollu))
- GCP support has been removed so that we can focus our efforts on improving the scalability, reliability, and security for Cortex on AWS. Cortex on GCP will still be available in v0.31. If you are currently using Cortex on GCP, our team will be happy to help you migrate to AWS or work with you to find alternative solutions. Please feel free to reach out to us on [slack](https://community.cortex.dev/) or email us at hello@cortex.dev if you're interested.
- Fix memory plots on Grafana dashboards for realtime and batch APIs https://github.com/cortexlabs/cortex/pull/2024 https://github.com/cortexlabs/cortex/pull/2014 https://github.com/cortexlabs/cortex/issues/1970 ([RobertLucian](https://github.com/RobertLucian))
- Misc docs improvements https://github.com/cortexlabs/cortex/pull/1994 ([ospillinger](https://github.com/ospillinger))
- Increase kubelet's `registryPullQPS` limit from 5 to 10 https://github.com/cortexlabs/cortex/pull/2023 https://github.com/cortexlabs/cortex/issues/1989 ([miguelvr](https://github.com/miguelvr))
- Pin the AMI version https://github.com/cortexlabs/cortex/pull/2010 https://github.com/cortexlabs/cortex/issues/1975 https://github.com/cortexlabs/cortex/issues/1615 ([vishalbollu](https://github.com/vishalbollu))
v0.31.1
📋 Changes
- Preemptible node pools on GCP aren't autoscaling https://github.com/cortexlabs/cortex/pull/1981 ([vishalbollu](https://github.com/vishalbollu))
- Replica autoscaler targets incorrect deployments on operator restart https://github.com/cortexlabs/cortex/pull/1982 ([miguelvr](https://github.com/miguelvr))
- Replica autoscaler is not reinitialized for running APIs on operator restart on GCP https://github.com/cortexlabs/cortex/pull/1984 ([vishalbollu](https://github.com/vishalbollu))
v0.31.0
📋 Changes
- Add support for AsyncAPI (experimental) ([docs](https://docs.cortex.dev/workloads/introduction)) https://github.com/cortexlabs/cortex/pull/1935 https://github.com/cortexlabs/cortex/issues/1610 ([miguelvr](https://github.com/miguelvr))
- Add support for multi-instance-type clusters to AWS/GCP providers (experimental) ([aws](https://docs.cortex.dev/clusters/aws/multi-instance-type)/[gcp](https://docs.cortex.dev/clusters/gcp/multi-instance-type) docs) https://github.com/cortexlabs/cortex/pull/1951 ([RobertLucian](https://github.com/RobertLucian))
- Allow users to duplicate/mirror traffic using shadow pipelines https://github.com/cortexlabs/cortex/pull/1948 https://github.com/cortexlabs/cortex/issues/1889 ([docs](https://docs.cortex.dev/workloads/realtime-apis/traffic-splitter/configuration)) ([vishalbollu](https://github.com/vishalbollu))
- `on_demand_backup` in cluster configuration has been removed in favour of using a cluster with a mixture of spot and on-demand nodegroups. See multi-instance documentation for [aws](https://docs.cortex.dev/clusters/aws/multi-instance-type) and [gcp](https://docs.cortex.dev/clusters/gcp/multi-instance-type) for more details.
- Fix Python client not respecting CORTEX_CLI_CONFIG_DIR environment variable for client-id.txt https://github.com/cortexlabs/cortex/pull/1953 ([jackmpcollins](https://github.com/jackmpcollins))
- Prevent threads from being stuck in DynamicBatcher https://github.com/cortexlabs/cortex/pull/1915 ([cbensimon](https://github.com/cbensimon))
- Fix unexpected cortex logs termination by increasing buffer size https://github.com/cortexlabs/cortex/pull/1939 ([vishalbollu](https://github.com/vishalbollu))
- Decouple cluster deletion from EBS volume deletion for cortex cluster down https://github.com/cortexlabs/cortex/pull/1954 ([deliahu](https://github.com/deliahu))
- + 5 more
v0.30.0
📋 Changes
- Record custom metrics from predictors and view them in Grafana ([docs](https://docs.cortex.dev/workloads/observability/metrics#custom-user-metrics)) https://github.com/cortexlabs/cortex/pull/1910 https://github.com/cortexlabs/cortex/issues/1897 ([miguelvr](https://github.com/miguelvr))
- Add granular pod metrics to the Grafana dashboards https://github.com/cortexlabs/cortex/pull/1905 ([RobertLucian](https://github.com/RobertLucian))
- Add node metrics to Grafana dashboards https://github.com/cortexlabs/cortex/pull/1900 ([miguelvr](https://github.com/miguelvr))
- Remove support for installing Cortex on your own Kubernetes Cluster https://github.com/cortexlabs/cortex/pull/1921 ([RobertLucian](https://github.com/RobertLucian))
- Fix bug where successfully completed jobs were marked as completed with errors https://github.com/cortexlabs/cortex/pull/1913 ([vishalbollu](https://github.com/vishalbollu))
- Fix bug where batch jobs were being terminated unnecessarily https://github.com/cortexlabs/cortex/pull/1917 ([vishalbollu](https://github.com/vishalbollu))
- Prevent cluster autoscaler from reallocating job pods https://github.com/cortexlabs/cortex/pull/1919 ([vishalbollu](https://github.com/vishalbollu))
- Address AWS cluster up quota issues such not enough NAT Gateways or EIPs https://github.com/cortexlabs/cortex/pull/1912 ([RobertLucian](https://github.com/RobertLucian))
- + 7 more
v0.29.0
📋 Changes
- Add Grafana dashboard for APIs ([docs](https://www.docs.cortex.dev/workloads/realtime-apis/metrics)) https://github.com/cortexlabs/cortex/pull/1867 https://github.com/cortexlabs/cortex/pull/1885 https://github.com/cortexlabs/cortex/pull/1890 https://github.com/cortexlabs/cortex/pull/1887 ([miguelvr](https://github.com/miguelvr))
- Support API autoscaling in GCP clusters ([docs](https://www.docs.cortex.dev/workloads/realtime-apis/autoscaling)) https://github.com/cortexlabs/cortex/pull/1814 https://github.com/cortexlabs/cortex/pull/1879 https://github.com/cortexlabs/cortex/issues/1601 ([miguelvr](https://github.com/miguelvr))
- Support traffic splitting in GCP clusters ([docs](https://www.docs.cortex.dev/workloads/realtime-apis/traffic-splitter/example)) https://github.com/cortexlabs/cortex/pull/1892 https://github.com/cortexlabs/cortex/issues/1660 ([miguelvr](https://github.com/miguelvr))
- The default Docker images for APIs have been slimmed down to not include packages other than what Cortex requires to function. Therefore, when deploying APIs, it is now necessary to include the dependencies that your predictor needs in `requirements.txt` ([docs](https://www.docs.cortex.dev/workloads/dependencies/python-packages)) and/or `dependencies.sh` ([docs](https://www.docs.cortex.dev/workloads/dependencies/system-packages)).
- Disable dynamic batcher for TensorFlow predictor type https://github.com/cortexlabs/cortex/pull/1888 ([miguelvr](https://github.com/miguelvr))
- Support empty directory objects for models saved in S3/GCS https://github.com/cortexlabs/cortex/pull/1830 https://github.com/cortexlabs/cortex/issues/1829 ([RobertLucian](https://github.com/RobertLucian))
- Fix bug which prevented Task APIs on GCP from being cleaned up after completion https://github.com/cortexlabs/cortex/pull/1871 ([RobertLucian](https://github.com/RobertLucian))
- Add documentation for using a version of Python other than the default via `dependencies.sh` ([docs](https://www.docs.cortex.dev/workloads/dependencies/system-packages)) or custom images ([docs](https://www.docs.cortex.dev/workloads/dependencies/images)) https://github.com/cortexlabs/cortex/pull/1862 https://github.com/cortexlabs/cortex/issues/1779 ([RobertLucian](https://github.com/RobertLucian))
- + 2 more
v0.28.0
📋 Changes
- Support installing Cortex on an existing Kubernetes cluster (on AWS or GCP) ([docs](https://docs.cortex.dev/clusters/cortex-core-on-kubernetes/install)) https://github.com/cortexlabs/cortex/pull/1837 https://github.com/cortexlabs/cortex/issues/1808 ([vishalbollu](https://github.com/vishalbollu))
- The cloudwatch dashboard has been removed as a result of our switch to Prometheus for metrics aggregation. The dashboard will be replaced with an alternative in an upcoming release.
- Fix bug which can cause requests to APIs from a Python client to timeout during cluster autoscaling https://github.com/cortexlabs/cortex/pull/1841 https://github.com/cortexlabs/cortex/issues/1840 ([RobertLucian](https://github.com/RobertLucian))
- Fix bug which can cause `downscale_stabilization_period` to be disregarded during downscaling https://github.com/cortexlabs/cortex/pull/1847 https://github.com/cortexlabs/cortex/issues/1846 ([RobertLucian](https://github.com/RobertLucian))
- AWS credentials are no longer required to connect the CLI to the cluster operator. If you need to restrict access to your cluster operator, configure the operator's load balancer to be private by setting `operator_load_balancer_scheme: internal` in your [cluster configuration file](https://docs.cortex.dev/clusters/cortex-cloud-on-aws/install#configure-cortex), and set up [VPC Peering](https://docs.cortex.dev/clusters/cortex-cloud-on-aws/index/vpc-peering). We plan in supporting a new auth strategy in an upcoming release.
- Improve S6 error code/signal handling https://github.com/cortexlabs/cortex/pull/1825 https://github.com/cortexlabs/cortex/issues/1703 ([RobertLucian](https://github.com/RobertLucian))
v0.27.0
📋 Changes
- Add new API type `TaskAPI` for running arbitrary Python jobs ([docs](https://docs.cortex.dev/workloads/task/example)) https://github.com/cortexlabs/cortex/pull/1717 https://github.com/cortexlabs/cortex/issues/253 ([miguelvr](https://github.com/miguelvr), [RobertLucian](https://github.com/RobertLucian))
- Write Cortex's logs as structured logs, and allow use of Cortex's structured logger in predictors (supports adding extra fields) ([aws docs](https://docs.cortex.dev/clusters/aws/logging), [gcp docs](https://docs.cortex.dev/clusters/gcp/logging)) https://github.com/cortexlabs/cortex/pull/1778 https://github.com/cortexlabs/cortex/pull/1803 https://github.com/cortexlabs/cortex/pull/1804 https://github.com/cortexlabs/cortex/issues/1732 https://github.com/cortexlabs/cortex/issues/1563 ([vishalbollu](https://github.com/vishalbollu))
- Support preemptible instances on GCP ([docs](https://docs.cortex.dev/clusters/gcp/install)) https://github.com/cortexlabs/cortex/pull/1791 https://github.com/cortexlabs/cortex/issues/1631 ([RobertLucian](https://github.com/RobertLucian))
- Support private load balancers on GCP ([docs](https://docs.cortex.dev/clusters/gcp/install)) https://github.com/cortexlabs/cortex/pull/1786 https://github.com/cortexlabs/cortex/issues/1621 ([deliahu](https://github.com/deliahu))
- Support GCP instances with multiple GPUs ([docs](https://docs.cortex.dev/clusters/gcp/install)) https://github.com/cortexlabs/cortex/pull/1789 https://github.com/cortexlabs/cortex/issues/1784 ([deliahu](https://github.com/deliahu))
- `cortex logs` now streams logs from a single replica at random when there are multiple replicas for an API. The recommended way to analyze production logs is via a dedicated logging tool (by default, logs are sent to [CloudWatch](https://us-west-2.console.aws.amazon.com/cloudwatch/home) on AWS and [StackDriver](https://console.cloud.google.com/logs/query) on GCP)
- Misc Python client fixes https://github.com/cortexlabs/cortex/pull/1798 https://github.com/cortexlabs/cortex/pull/1782 https://github.com/cortexlabs/cortex/pull/1772 ([vishalbollu](https://github.com/vishalbollu), [RobertLucian](https://github.com/RobertLucian))
- Document the shared `/mnt` directory for TensorFlow predictors https://github.com/cortexlabs/cortex/pull/1802 https://github.com/cortexlabs/cortex/issues/1792 ([deliahu](https://github.com/deliahu))
- + 4 more
v0.26.0
📋 Changes
- Support configuring the log level for APIs ([docs](https://docs.cortex.dev/v/0.26/workloads/realtime/configuration)) https://github.com/cortexlabs/cortex/pull/1741 https://github.com/cortexlabs/cortex/issues/1484 ([RobertLucian](https://github.com/RobertLucian))
- Support creating a cluster in an existing AWS VPC ([docs](https://docs.cortex.dev/v/0.26/clusters/aws/install)) https://github.com/cortexlabs/cortex/pull/1759 https://github.com/cortexlabs/cortex/issues/1142 ([deliahu](https://github.com/deliahu))
- Support specifying the GCP network and subnet for the Cortex cluster ([docs](https://docs.cortex.dev/v/0.26/clusters/gcp/install)) https://github.com/cortexlabs/cortex/pull/1752 https://github.com/cortexlabs/cortex/issues/1738 ([deliahu](https://github.com/deliahu))
- Support configuring shared memory size (shm) for inter-process communication ([docs](https://docs.cortex.dev/v/0.26/workloads/realtime/configuration)) https://github.com/cortexlabs/cortex/pull/1756 https://github.com/cortexlabs/cortex/issues/1638 ([vishalbollu](https://github.com/vishalbollu))
- The local provider has been removed. The best way to test your predictor implementation locally is to import it in a separate Python file and call your `__init__()` and `predict()` functions directly. The best way to test your API is to deploy it to a dev/test cluster.
- Built-in support for API Gateway has been removed. If you need to create an https endpoint with valid certs, some options are to set up a [custom domain](https://docs.cortex.dev/v/0.26/clusters/aws/index/custom-domain) or to [manually create an API Gateway](https://docs.cortex.dev/v/0.26/clusters/aws/index/https).
- Prediction monitoring has been removed. We are exploring how to build a more powerful and customizable solution for this.
- The `predict` CLI command has been deleted. `curl`, `requests`, etc. are the best tools for testing APIs.
- + 2 more
v0.25.0
📋 Changes
- Support server-side micro batching for the Python predictor ([docs](https://www.docs.cortex.dev/v/0.25/workloads/realtime/server-side-batching)) https://github.com/cortexlabs/cortex/pull/1653 https://github.com/cortexlabs/cortex/issues/1382 ([miguelvr](https://github.com/miguelvr))
- Add timeout configuration for batch jobs ([docs](https://www.docs.cortex.dev/v/0.25/workloads/batch/endpoints)) https://github.com/cortexlabs/cortex/pull/1712 https://github.com/cortexlabs/cortex/issues/1324 ([vishalbollu](https://github.com/vishalbollu))
- Support batch retries ([docs](https://www.docs.cortex.dev/v/0.25/workloads/batch/endpoints)) https://github.com/cortexlabs/cortex/pull/1713 https://github.com/cortexlabs/cortex/issues/1540 ([lapaniku](https://github.com/lapaniku), [vishalbollu](https://github.com/vishalbollu))
- Support sending failed batches to a dead-letter queue ([docs](https://www.docs.cortex.dev/v/0.25/workloads/batch/endpoints)) https://github.com/cortexlabs/cortex/pull/1713 https://github.com/cortexlabs/cortex/issues/1541 ([lapaniku](https://github.com/lapaniku), [vishalbollu](https://github.com/vishalbollu))
- Support installing the cortex Python client in predictors https://github.com/cortexlabs/cortex/pull/1709 https://github.com/cortexlabs/cortex/issues/1670 https://github.com/cortexlabs/cortex/issues/1206 ([RobertLucian](https://github.com/RobertLucian))
- The `predictor.model_path` field of the realtime api configuration has been moved to `predictor.models.path`. In addition, for the Python predictor type, `predictor.models` has been renamed to `predictor.multi_model_reloading`. Here is the entire [API configuration schema](https://www.docs.cortex.dev/v/0.25/workloads/realtime/configuration).
- Misc batch reliability improvements https://github.com/cortexlabs/cortex/pull/1705 https://github.com/cortexlabs/cortex/pull/1718 https://github.com/cortexlabs/cortex/pull/1729 ([vishalbollu](https://github.com/vishalbollu))
- Reorganize the [docs](https://www.docs.cortex.dev) structure https://github.com/cortexlabs/cortex/pull/1696 https://github.com/cortexlabs/cortex/pull/1701 https://github.com/cortexlabs/cortex/pull/1704 https://github.com/cortexlabs/cortex/pull/1719 https://github.com/cortexlabs/cortex/issues/1675 ([ospillinger](https://github.com/ospillinger))
- + 7 more
v0.24.1
📋 Changes
- Propagate the exit code from the predictor's initialization so that the API status is set to "error" when initialization fails https://github.com/cortexlabs/cortex/issues/1680 https://github.com/cortexlabs/cortex/pull/1691 ([RobertLucian](https://github.com/RobertLucian))
v0.24.0
📋 Changes
- Add GCP support: our initial release supports all three predictor types (Python, TensorFlow, ONNX), on CPU or GPU, with live reloading, multi-model caching, and cluster autoscaling https://github.com/cortexlabs/cortex/pull/1655 https://github.com/cortexlabs/cortex/pull/1672 https://github.com/cortexlabs/cortex/pull/1667 https://github.com/cortexlabs/cortex/issues/1661 https://github.com/cortexlabs/cortex/issues/114 https://github.com/cortexlabs/cortex/issues/1600 https://github.com/cortexlabs/cortex/issues/1602 https://github.com/cortexlabs/cortex/issues/1616 https://github.com/cortexlabs/cortex/issues/1624 ([RobertLucian](https://github.com/RobertLucian), [deliahu](https://github.com/deliahu), [vishalbollu](https://github.com/vishalbollu))
- Add the patch command to the CLI and [Python client](https://docs.cortex.dev/v/0.24/workloads/python-client), which can be used to update an API using only the API configuration (without needing to provide the predictor's Python implementation) https://github.com/cortexlabs/cortex/pull/1651 https://github.com/cortexlabs/cortex/pull/1666 https://github.com/cortexlabs/cortex/issues/1329 ([vishalbollu](https://github.com/vishalbollu))
- Support deploying predictor Python classes from the Python client https://github.com/cortexlabs/cortex/pull/1587 https://github.com/cortexlabs/cortex/issues/1617 (see the [tutorial](https://docs.cortex.dev/v/0.24/tutorials/realtime) for an example) ([vishalbollu](https://github.com/vishalbollu))
- The Python client's `deploy()` function has been renamed to `create_api()`, and some of the argument names have changed ([docs](https://docs.cortex.dev/v/0.24/workloads/python-client))
- Enable CORS for APIs accessed via API Gateway or load balancer https://github.com/cortexlabs/cortex/pull/1649 https://github.com/cortexlabs/cortex/issues/1234 ([RobertLucian](https://github.com/RobertLucian), [deliahu](https://github.com/deliahu))
- Fix local TensorFlow models when live reloading is enabled https://github.com/cortexlabs/cortex/pull/1668 https://github.com/cortexlabs/cortex/issues/1554 ([RobertLucian](https://github.com/RobertLucian))
- Prevent TensorFlow multi-model caching from attempting to download local models from S3 https://github.com/cortexlabs/cortex/pull/1669 https://github.com/cortexlabs/cortex/issues/1598 ([RobertLucian](https://github.com/RobertLucian))
- Miscellaneous docs improvements ([vishalbollu](https://github.com/vishalbollu), [ospillinger](https://github.com/ospillinger))
- + 3 more
v0.23.0
📋 Changes
- Update Python client `deploy()` to accept a Python dictionary for API configuration (previously, only a file path was supported) ([docs](https://docs.cortex.dev/v/0.23/miscellaneous/python-client#deploy)) https://github.com/cortexlabs/cortex/pull/1587 ([vishalbollu](https://github.com/vishalbollu))
- Show API deployment history in `cortex get API_NAME` command https://github.com/cortexlabs/cortex/pull/1544 https://github.com/cortexlabs/cortex/issues/1496 ([deliahu](https://github.com/deliahu))
- Add `cortex export API_NAME` and `cortex export API_NAME API_ID` commands to export specific and historical API deployments https://github.com/cortexlabs/cortex/pull/1544 https://github.com/cortexlabs/cortex/issues/1497 ([deliahu](https://github.com/deliahu))
- Build and push `python-predictor-gpu-slim` image with different combinations of cuda and cudnn (`cuda10.0-cudnn7`, `cuda10.1-cudnn7`, `cuda10.1-cudnn8`, `cuda10.2-cudnn7`, `cuda10.2-cudnn8`, `cuda11.0-cudnn8`, `cuda11.1-cudnn8`) ([docs](https://docs.cortex.dev/v/0.23/advanced/system-packages#custom-docker-image)) https://github.com/cortexlabs/cortex/pull/1575 https://github.com/cortexlabs/cortex/issues/1574 ([deliahu](https://github.com/deliahu))
- Allow local deployments of public S3 models without requiring AWS credentials https://github.com/cortexlabs/cortex/pull/1589 https://github.com/cortexlabs/cortex/issues/1588 ([RobertLucian](https://github.com/RobertLucian))
- Add guide for [avoiding Docker Hub rate limits](https://docs.cortex.dev/v/0.23/guides/docker-hub-rate-limiting) https://github.com/cortexlabs/cortex/pull/1576 ([RobertLucian](https://github.com/RobertLucian), [deliahu](https://github.com/deliahu))
- Add guide for [self-hosting Cortex's Docker images](https://docs.cortex.dev/v/0.23/guides/self-hosted-images) https://github.com/cortexlabs/cortex/pull/1579 ([RobertLucian](https://github.com/RobertLucian), [deliahu](https://github.com/deliahu))
- Remove API request maximum payload size limit https://github.com/cortexlabs/cortex/pull/1583 ([deliahu](https://github.com/deliahu))
- + 1 more
v0.22.1
📋 Changes
- Set the predictor's working directory to the root Cortex project directory https://github.com/cortexlabs/cortex/pull/1573 https://github.com/cortexlabs/cortex/issues/1572 ([deliahu](https://github.com/deliahu))
- Allow `max_instances` to be updated via `cortex cluster configure` https://github.com/cortexlabs/cortex/pull/1568 https://github.com/cortexlabs/cortex/issues/1567 ([deliahu](https://github.com/deliahu))
- Gracefully stop the serving container when a multi-processed cron throws exception https://github.com/cortexlabs/cortex/pull/1560 https://github.com/cortexlabs/cortex/issues/1552 ([RobertLucian](https://github.com/RobertLucian))
- Demonstrate how to make API requests with various payload types (binary, form fields, etc), and show how to access them in `predict()` https://github.com/cortexlabs/cortex/pull/1566 ([docs](https://docs.cortex.dev/v/0.22/deployments/realtime-api/predictors#api-requests))
- Misc docs improvements https://github.com/cortexlabs/cortex/pull/1551 https://github.com/cortexlabs/cortex/pull/1556 c3dab4045a61703cb1db1d5f95776614252f96c0 https://github.com/cortexlabs/cortex/pull/1557 ([deliahu](https://github.com/deliahu), [RobertLucian](https://github.com/RobertLucian))
- Build and upload the Python package/CLI to a public S3 bucket https://github.com/cortexlabs/cortex/pull/1562 ([vishalbollu](https://github.com/vishalbollu))
v0.22.0
📋 Changes
- Multi-model caching: serve a collection of models that is collectively bigger than what will fit in memory (via LRU cache eviction) ([docs](https://docs.cortex.dev/v/0.22/deployments/realtime-api/models#multi-model-caching)) https://github.com/cortexlabs/cortex/pull/1428 https://github.com/cortexlabs/cortex/issues/619 ([RobertLucian](https://github.com/RobertLucian))
- Live reloading: support updating models in running APIs by adding new versions to the model's S3 directory ([docs](https://docs.cortex.dev/v/0.22/deployments/realtime-api/models#live-model-reloading)) https://github.com/cortexlabs/cortex/pull/1428 https://github.com/cortexlabs/cortex/issues/1252 ([RobertLucian](https://github.com/RobertLucian))
- Inter-process fairness: distribute requests within an API replica evenly across all processes https://github.com/cortexlabs/cortex/pull/1526 https://github.com/cortexlabs/cortex/issues/839 https://github.com/cortexlabs/cortex/issues/1298 ([RobertLucian](https://github.com/RobertLucian))
- Support requests between APIs within the same cluster ([docs](https://docs.cortex.dev/v/0.22/deployments/realtime-api/predictors#chaining-apis)) https://github.com/cortexlabs/cortex/pull/1503 https://github.com/cortexlabs/cortex/issues/1241 ([deliahu](https://github.com/deliahu))
- Allow overriding of CLI install path and config directory (via `$CORTEX_INSTALL_PATH` and `$CORTEX_CLI_CONFIG_DIR`) ([docs](https://docs.cortex.dev/v/0.22/miscellaneous/cli#mac-linux-os)) https://github.com/cortexlabs/cortex/pull/1521 https://github.com/cortexlabs/cortex/issues/1222 ([deliahu](https://github.com/deliahu))
- ONNX model paths in API configuration files must now point to a directory containing a single ONNX file, rather than the onnx file itself. For example `model_path: s3://cortex-examples/onnx/yolov5-youtube/yolov5s.onnx` becomes `model_path: s3://cortex-examples/onnx/yolov5-youtube`.
- The `--env/-e` flag in all `cortex cluster` commands has been renamed to `--configure-env/-e`, and if not provided, the environment named `aws` will no longer be configured in the `cortex cluster info` command
- Fix intermittent failed requests during rolling updates https://github.com/cortexlabs/cortex/pull/1526 https://github.com/cortexlabs/cortex/issues/814 ([RobertLucian](https://github.com/RobertLucian))
- + 11 more
v0.21.0
📋 Changes
- Add Python client: [pypi.org/project/cortex](https://pypi.org/project/cortex/) https://github.com/cortexlabs/cortex/pull/1449 https://github.com/cortexlabs/cortex/issues/684 ([vishalbollu](https://github.com/vishalbollu))
- Add support for private docker image registries ([docs](https://docs.cortex.dev/guides/private-docker)) https://github.com/cortexlabs/cortex/pull/1460 https://github.com/cortexlabs/cortex/issues/1113 ([deliahu](https://github.com/deliahu))
- Fix minor BatchAPI bugs https://github.com/cortexlabs/cortex/pull/1471 https://github.com/cortexlabs/cortex/pull/1468 https://github.com/cortexlabs/cortex/pull/1480 https://github.com/cortexlabs/cortex/issues/1473 ([vishalbollu](https://github.com/vishalbollu), [RobertLucian](https://github.com/RobertLucian))
- Bypass instance limit check if AWS's API doesn't provide quota information (this was blocking cluster creation in `eu-north-1`) https://github.com/cortexlabs/cortex/pull/1439 https://github.com/cortexlabs/cortex/issues/1438 ([deliahu](https://github.com/deliahu))
- Add a guide for how to [install the CLI on Windows](https://docs.cortex.dev/guides/windows-cli) https://github.com/cortexlabs/cortex/pull/1476 https://github.com/cortexlabs/cortex/issues/715 ([RobertLucian](https://github.com/RobertLucian))
- Change default local port from 8888 to 8890 to avoid port conflicts with Jupyter https://github.com/cortexlabs/cortex/pull/1456 ([vishalbollu](https://github.com/vishalbollu))
- Disallow instance types that aren't supported by NLB https://github.com/cortexlabs/cortex/pull/1436 https://github.com/cortexlabs/cortex/issues/1433 ([deliahu](https://github.com/deliahu))
- Add `--cluster-aws-key` and `--cluster-aws-secret` flags to `cortex cluster configure` command https://github.com/cortexlabs/cortex/pull/1404 ([deliahu](https://github.com/deliahu))
- + 1 more
v0.20.0
📋 Changes
- Add `cortex cluster export` command to export all APIs running in a cluster ([docs](https://docs.cortex.dev/v/0.20/miscellaneous/cli#cluster-export)) https://github.com/cortexlabs/cortex/pull/1368 https://github.com/cortexlabs/cortex/issues/1255 ([vishalbollu](https://github.com/vishalbollu))
- Enable users to specify CIDR ranges for the cluster's VPC ([docs](https://docs.cortex.dev/v/0.20/cluster-management/config)) https://github.com/cortexlabs/cortex/pull/1388 ([vishalbollu](https://github.com/vishalbollu))
- Support json output for CLI commands (via `-o/--output json`) https://github.com/cortexlabs/cortex/pull/1365 https://github.com/cortexlabs/cortex/issues/1161 ([vishalbollu](https://github.com/vishalbollu))
- Support the nvidia device driver (nvidia-container-toolkit) when running locally https://github.com/cortexlabs/cortex/pull/1366 https://github.com/cortexlabs/cortex/issues/1223 ([vishalbollu](https://github.com/vishalbollu))
- The valid values for `api_gateway` in the cluster configuration file have been changed from `enabled`/`disabled` to `public`/`none` (to match the values for `networking.api_gateway` in the API configuration file).
- Support AWS tags with spaces and valid special characters https://github.com/cortexlabs/cortex/pull/1374 https://github.com/cortexlabs/cortex/pull/1355 https://github.com/cortexlabs/cortex/pull/1380 https://github.com/cortexlabs/cortex/pull/1385 https://github.com/cortexlabs/cortex/issues/1373 ([deliahu](https://github.com/deliahu))
- Fix tensor shape validation for the TensorFlow predictor https://github.com/cortexlabs/cortex/pull/1311 https://github.com/cortexlabs/cortex/issues/1310 ([RobertLucian](https://github.com/RobertLucian))
- Allow `cortex cluster *` commands to be run from within a docker container https://github.com/cortexlabs/cortex/pull/1370 https://github.com/cortexlabs/cortex/issues/1361 https://github.com/cortexlabs/cortex/issues/1325 ([deliahu](https://github.com/deliahu))
- + 14 more
v0.19.0
📋 Changes
- Support batch APIs [docs](https://docs.cortex.dev/v/0.19/deployments/batch-api) https://github.com/cortexlabs/cortex/pull/1203 https://github.com/cortexlabs/cortex/issues/523 ([vishalbollu](https://github.com/vishalbollu))
- Support traffic splitting (enables A/B testing, multi-armed bandit, etc) [docs](https://docs.cortex.dev/v/0.19/deployments/realtime-api/traffic-splitter) https://github.com/cortexlabs/cortex/pull/1213 https://github.com/cortexlabs/cortex/pull/1270 https://github.com/cortexlabs/cortex/issues/1132 https://github.com/cortexlabs/cortex/issues/275 https://github.com/cortexlabs/cortex/issues/1089 ([tthebst](https://github.com/tthebst))
- Support server-side request batching for the TensorFlow Predictor [docs](https://docs.cortex.dev/v/0.19/deployments/realtime-api/parallelism#server-side-batching) https://github.com/cortexlabs/cortex/pull/1193 https://github.com/cortexlabs/cortex/issues/1060 ([RobertLucian](https://github.com/RobertLucian))
- Add `post_predict()` method to Predictor interface (runs after the response has been sent) [docs](https://docs.cortex.dev/v/0.19/deployments/realtime-api/predictors) https://github.com/cortexlabs/cortex/pull/1237 https://github.com/cortexlabs/cortex/issues/954 ([RobertLucian](https://github.com/RobertLucian))
- Support disabling API Gateway cluster-wide [docs](https://docs.cortex.dev/v/0.19/cluster-management/config) https://github.com/cortexlabs/cortex/pull/1259 https://github.com/cortexlabs/cortex/issues/1198 ([deliahu](https://github.com/deliahu))
- Support different CUDA versions for the slim Python Predictor image [docs](https://docs.cortex.dev/v/0.19/advanced/system-packages#custom-docker-image) https://github.com/cortexlabs/cortex/pull/1263 https://github.com/cortexlabs/cortex/issues/923 https://github.com/cortexlabs/cortex/issues/1254 ([RobertLucian](https://github.com/RobertLucian))
- Add additional widgets to the CloudWatch Dashboard (avg in-flight requests per replica, active replicas) [docs](https://docs.cortex.dev/v/0.19/guides/metrics) https://github.com/cortexlabs/cortex/pull/1181 ([RobertLucian](https://github.com/RobertLucian))
- `kind` is now a required top-level field for all API configurations. Existing APIs should add `kind: RealtimeAPI`. This release adds support for `kind: BatchAPI` and `kind: TrafficSplitter`.
- + 15 more
v0.18.1
📋 Changes
- Fix dynamic axes for ONNX models https://github.com/cortexlabs/cortex/pull/1187 https://github.com/cortexlabs/cortex/issues/1186 ([RobertLucian](https://github.com/RobertLucian))
- Fix memory node capacity calculation for multi-api configuration files https://github.com/cortexlabs/cortex/pull/1185 ([deliahu](https://github.com/deliahu))
- Check cluster-name tag when choosing load balancer for VPC Link integration https://github.com/cortexlabs/cortex/pull/1173 ([deliahu](https://github.com/deliahu))
- [Troubleshooting: API request errors](https://docs.cortex.dev/troubleshooting/api-request-errors) ([deliahu](https://github.com/deliahu))
- [Troubleshooting: TensorFlow session in predict()](https://docs.cortex.dev/troubleshooting/tf-session-in-predict) ([RobertLucian](https://github.com/RobertLucian))
- Delete API Gateway if `cluster up` fails https://github.com/cortexlabs/cortex/pull/1172 ([deliahu](https://github.com/deliahu))
- Move image version verification from serve.py to run.sh https://github.com/cortexlabs/cortex/pull/1180 https://github.com/cortexlabs/cortex/pull/1183 ([vishalbollu](https://github.com/vishalbollu))
- Add retries for resource tagging during `cluster up` https://github.com/cortexlabs/cortex/pull/1188 ([deliahu](https://github.com/deliahu))
- + 3 more
