* Drop _created metrics for broker and proxy
* Enable all metrics by default for broker
* change default dashboard
* Remove messy dashboards
* Enable default dashboards in Grafana
* Add testing values with more aggressive disk cleanup
* Add VictoriaMetrics debugging instructions
* Set honorLabels to true
* Document disabling monitoring
* Set password in testing values
* Fix linting issue detected by kubeconform
* Upgrade to kube-prometheus-stack 67.x
* Prometheus operator is upgraded to 0.80.0
* Prometheus is upgraded from 2.55.0 to 3.2.1
* Enable pod monitors to test them
* Run linting with kube-prometheus-stack enabled
* Validate all CI configs
* Use Pulsar 4.0.0 image, bump chart version to 3.7.0
* Bump kube-prometheus-stack to 65.x.x
* Remove testing with latest and test with previous LTS version
- run kube-prometheus-stack test with previous LTS version since
the older chart version doesn't support Pulsar 4.0.0 image
* Fix passing "--values" to helm command
* Move ci runner config to a script
* Attempt to fix pulsar-manager-cluster-initialize
* Upgrade upgrade kind, chart releaser and helm versions
* Disable podMonitory for values-broker-tls.yaml file
- was missing from #317
* Use k8s 1.18.20
* Use ubuntu-20.04 runtime
- k8s < 1.19 doesn't support cgroup v2
* Upgrade to k8s 1.19 as baseline
* Baseline to k8s 1.20
* Set ip family to ipv4
* Add more logging to kind cluster creation
* Simplify duplicate job deletion
* use verbosity flag
* Upgrade to k8s 1.24
* Replace removed tolerate-unready-endpoints annotation with publishNotReadyAddresses
(cherry picked from commit e90926053a2b01bb95529fbaddc8d2ce2cdeec63)
* Use k8s 1.21 as baseline
* Run on ubuntu-22.04
* Use Pulsar 2.10.4
* Copy release process doc from Apache Airflow
Source: fb741fd872/dev/README_RELEASE_HELM_CHART.md
* Adapt to Apache Pulsar
* Remove old release process notes
* Fix typo
* Apply suggestions from code review
Co-authored-by: tison <wander4096@gmail.com>
* Add sign.sh script for release artifacts
Script is copied from 395ad7110e/dev/sign.sh
* Add some updates (more to might follow)
* Add some more updates to the rest of the release plan
* Fix rat check command
Co-authored-by: tison <wander4096@gmail.com>
Relates to #290
### Motivation
Make the Apache Pulsar Helm Chart release follow ASF rules for voting, and make the helm binary available via dist.apache.org. By following the information in https://issues.apache.org/jira/browse/LEGAL-573 and in the Apache Airflow project https://github.com/apache/airflow/blob/main/dev/README_RELEASE_HELM_CHART.md, I built this new release process. It will likely need some iterative improvement.
### Modifications
* Add a release process that is based on the Apache Airflow release process
### Verifying this change
- [ ] Make sure that the change passes the CI checks.
Fixes#287
### Motivation
The current steps to install the Apache Pulsar Helm Chart include an unnecessary script `scripts/pulsar/prepare_helm_release.sh`. It relies on tooling that has not been maintained and is not a part of the Apache Pulsar project. As such, I propose we remove these references.
Note that one of the reasons we used these scripts historically is to simplify deployment. Without these scripts, we should document what is necessary. I am tracking that work here https://github.com/apache/pulsar-helm-chart/issues/323.
* Imrpove documentation and testing for PodMonitors
* Fix missed references and a typo
### Motivation
Before upgrading to 3.0.0, we want to make sure the kube-prometheus-stack is well documented.
### Modifications
* Update tests and examples to fully disable `PodMonitors` and the installation of the kube-prometheus-stack CRDs.
### Verifying this change
The current tests will cover these changes.
### Motivation
Some of the links in the README are out of date. This PR fixes the ones that I found. Note that the ones with `/en` were not technically broken.
Master Issue: https://github.com/apache/pulsar/issues/11269
### Motivation
Apache Pulsar's docker images for 2.10.0 and above are non-root by default. In order to ensure there is a safe upgrade path, we need to expose the `securityContext` for the Bookkeeper and Zookeeper StatefulSets. Here is the relevant k8s documentation on this k8s feature: https://kubernetes.io/docs/tasks/configure-pod-container/security-context.
Once released, all deployments using the default `values.yaml` configuration for the `securityContext` will pay a one time penalty on upgrade where the kubelet will recursively chown files to be root group writable. It's possible to temporarily avoid this penalty by setting `securityContext: {}`.
### Modifications
* Add config blocks for the `bookkeeper.securityContext` and `zookeeper.securityContext`.
* Default to `fsGroup: 0`. This is already the default group id in the docker image, and the docker image assumes the user has root group permission.
* Default to `fsGroupChangePolicy: "OnRootMismatch"`. This configuration will work for all deployments where the user id is stable. If the user id switches between restarts, like it does in OpenShift, please set to `Always`.
* Remove gc configuration writing to directory that the user lacks permission. (Perhaps we want to write to `/pulsar/log/bookie-gc.log`?)
* Add documentation to the README.
### Verifying this change
I first attempted verification of this change with minikube. It did not work because minikube uses hostPath volumes by default. I then tested on EKS v1.21.9-eks-0d102a7. I tested by deploying the current, latest version of the helm chart (2.9.3) and then upgrading to this PR's version of the helm chart along with using the 2.10.0 docker image. I also tested upgrading from a default version
Test 1 is a plain upgrade using the default 2.9.3 version of the chart, then upgrading to this PR's version of the chart with the modification to use the 2.10.0 docker images. It worked as expected.
```bash
$ helm install test apache/pulsar
$ # Wait for chart to deploy, then run the following, which uses Pulsar version 2.10.0:
$ helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
```
Test 2 is a plain upgrade using the default 2.9.3 version of the chart, then an upgrade to this PR's version of the chart, then an upgrade to this PR's version of the chart using 2.10.0 docker images. There is a minor error described in the `README.md`. The solution is to chown the bookie's data directory.
```bash
$ helm install test apache/pulsar
$ # Wait for chart to deploy, then run the following, which uses Pulsar version 2.9.2:
$ helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
$ # Upgrade using Pulsar version 2.10.0
$ helm upgrade test -f charts/pulsar/values.yaml charts/pulsar/
```
### GC Logging
In my testing, I ran into the following errors when using `-Xlog:gc:/var/log/bookie-gc.log`:
```
pulsar-bookkeeper-verify-clusterid [0.008s] Error opening log file '/var/log/bookie-gc.log': Permission denied
pulsar-bookkeeper-verify-clusterid [0.008s] Initialization of output 'file=/var/log/bookie-gc.log' using options '(null)' failed.
pulsar-bookkeeper-verify-clusterid [0.005s] Error opening log file '/var/log/bookie-gc.log': Permission denied
pulsar-bookkeeper-verify-clusterid [0.006s] Initialization of output 'file=/var/log/bookie-gc.log' using options '(null)' failed.
pulsar-bookkeeper-verify-clusterid Invalid -Xlog option '-Xlog:gc:/var/log/bookie-gc.log', see error log for details.
pulsar-bookkeeper-verify-clusterid Error: Could not create the Java Virtual Machine.
pulsar-bookkeeper-verify-clusterid Error: A fatal exception has occurred. Program will exit.
pulsar-bookkeeper-verify-clusterid Invalid -Xlog option '-Xlog:gc:/var/log/bookie-gc.log', see error log for details.
pulsar-bookkeeper-verify-clusterid Error: Could not create the Java Virtual Machine.
pulsar-bookkeeper-verify-clusterid Error: A fatal exception has occurred. Program will exit.
```
I resolved the error by removing the setting.
### OpenShift Observations
I wanted to seamlessly support OpenShift, so I investigated using configuring the bookkeeper and zookeeper process with `umask 002` so that they would create files and directories that are group writable (OpenShift has a stable group id, but gives the process a random user id). That worked for most tools when switching the user id, but not for RocksDB, which creates a lock file at `/pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK` with the permission `0644` ignoring the umask. Here is the relevant error:
```
2022-05-14T03:45:06,903+0000 ERROR org.apache.bookkeeper.server.Main - Failed to build bookie server
java.io.IOException: Error open RocksDB database
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:199) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:88) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.lambda$static$0(KeyValueStorageRocksDB.java:62) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.storage.ldb.LedgerMetadataIndex.<init>(LedgerMetadataIndex.java:68) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.storage.ldb.SingleDirectoryDbLedgerStorage.<init>(SingleDirectoryDbLedgerStorage.java:169) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.newSingleDirectoryDbLedgerStorage(DbLedgerStorage.java:150) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.storage.ldb.DbLedgerStorage.initialize(DbLedgerStorage.java:129) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.bookie.Bookie.<init>(Bookie.java:818) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.proto.BookieServer.newBookie(BookieServer.java:152) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.proto.BookieServer.<init>(BookieServer.java:120) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.server.service.BookieService.<init>(BookieService.java:52) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.server.Main.buildBookieServer(Main.java:304) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.server.Main.doMain(Main.java:226) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
at org.apache.bookkeeper.server.Main.main(Main.java:208) [org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
Caused by: org.rocksdb.RocksDBException: while open a file for lock: /pulsar/data/bookkeeper/ledgers/current/ledgers/LOCK: Permission denied
at org.rocksdb.RocksDB.open(Native Method) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
at org.rocksdb.RocksDB.open(RocksDB.java:239) ~[org.rocksdb-rocksdbjni-6.10.2.jar:?]
at org.apache.bookkeeper.bookie.storage.ldb.KeyValueStorageRocksDB.<init>(KeyValueStorageRocksDB.java:196) ~[org.apache.bookkeeper-bookkeeper-server-4.14.4.jar:4.14.4]
... 13 more
```
As such, in order to support OpenShift, I exposed the `fsGroupChangePolicy`, which allows for OpenShift support, but not necessarily _seamless_ support.
*Motivation*
The current helm chart is lacking documentation. This pull request aims to add documentation.
*Changes*
- Update Helm chart documentation
- Add a get-started section with Helm chart
- Remove the documentation of using yaml files.
Signed-off-by: xiaolong.ran <rxl@apache.org>
Signed-off-by: xiaolong.ran <rxl@apache.org>
Fixes#5787
### Motivation
When we creating a K8S cluster on Minikube, due to the different versions of Minikube in the local environment, the installation fails on `--kubernetes-version=v1.10.5`.
### Modifications
- Remove the `--kubernetes-version=v1.10.5` in docs.
* [documentation][deploy] Update deployment instructions for deploying to Minikube
* Enable functions workers
* [documentation][deploy] Improve helm deployment script to deploy Pulsar to minikube
### Changes
- update the helm scripts: bookie/autorecovery/broker pods should wait until metadata is initialized
- disable `autoRecovery` on bookies since we start `AutoRecovery` in separate pods
- enable function worker on brokers
- provide a values file for minikube
- update documentation for using helm chart to deploy a cluster to minikube
* move the service type definition to values file
* Helm charts for deployment on GKE
* Repackaginh helm charts under deployment/kubernetes/helm
* Formatting licences
* Removing cloud specific values to enable more generic deployments