1
0
Commit Graph

64 Commits

Author SHA1 Message Date
Josh Black
63ba1cd645
CE changes for https://github.com/hashicorp/vault-enterprise/pull/5695 (#26449) (#26455) 2024-04-16 14:36:25 -07:00
hc-github-team-es-release-engineering
48ab1eae08
[DO NOT MERGE UNTIL EOY] EOY license fixes 1.14.x (#24390) 2024-01-02 10:36:20 -08:00
hc-github-team-secure-vault-core
44fdf3b98d
backport of commit c329ed8d3b02b92dfded30065317c82648d3cae3 (#24260)
Co-authored-by: Steven Clark <steven.clark@hashicorp.com>
2023-11-27 16:21:51 -05:00
Hamid Ghaf
96f5e64b83
Revert "Automatically track subloggers in allLoggers (#22038)" (#24005)
This reverts commit 4c8cc87794ed2d989f515cd30c1c1b953d092ef3.
2023-11-03 14:40:17 -07:00
hc-github-team-secure-vault-core
1ed2747f79
backport of commit 1f1ead0dc72e24ecaf5abe3784aac79cfbd5124b (#23615)
Co-authored-by: Josh Black <raskchanky@gmail.com>
2023-10-11 18:14:21 +00:00
hc-github-team-secure-vault-core
4c0edc73b2
backport of commit 4c8cc87794ed2d989f515cd30c1c1b953d092ef3 (#22247)
Co-authored-by: Mike Palmiotto <mike.palmiotto@hashicorp.com>
2023-09-01 13:02:28 -04:00
Victor Rodriguez
2d7efaef97
Convert seal.Access struct into a interface (OSS) (#20510)
* Move seal barrier type field from Access to autoSeal struct.

Remove method Access.SetType(), which was only being used by a single test, and
which can use the name option of NewTestSeal() to specify the type.

* Change method signatures of Access to match those of Wrapper.

* Turn seal.Access struct into an interface.

* Tweak Access implementation.

Change `access` struct to have a field of type wrapping.Wrapper, rather than
extending it.

* Add method Seal.GetShamirWrapper().

Add method Seal.GetShamirWrapper() for use by code that need to perform
Shamir-specific operations.
2023-05-04 14:22:30 -04:00
Peter Wilson
2054ffcbfa
VAULT-14048: raft-autopilot appears to refuse to remove a node which has left and wouldn't impact stability (#19472)
* ensure we supply the node type when it's for a voter
* bumped autopilot version back to v0.2.0 and ran go mod tidy
* changed condition in knownservers and added some comments
* Export GetRaftBackend
* Updated tests for autopilot (related to dead server cleanup)
* Export Raft NewDelegate

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2023-04-03 11:58:57 -04:00
Hamid Ghaf
e55c18ed12
adding copyright header (#19555)
* adding copyright header

* fix fmt and a test
2023-03-15 09:00:52 -07:00
miagilepner
13caa0842e
VAULT-8436 remove <-time.After statements in for loops (#18818)
* replace time.After with ticker in loops

* add semgrep rule

* update to use timers

* remove stop
2023-02-06 17:49:01 +01:00
Josh Black
5b083266ef
Enable undo logs by default (#18692)
* Enable undo logs by default

* add consul test

* update go.mod/sum

* add a better non-existent key
2023-01-17 13:38:18 -08:00
Tom Proctor
60f92bbeef
storage/raft: Add retry_join_as_non_voter config option (#18030) 2022-11-18 17:58:16 +00:00
Josh Black
42a8cc1189
disable undo logs by default for 1.12.0 (#17453) 2022-10-07 08:47:40 -07:00
Josh Black
db71fdb087
only enable undo logs if all cluster members support it (#17378) 2022-10-06 11:24:16 -07:00
Nick Cabatoff
cbbf1a5593
Break grabLockOrStop into two pieces to facilitate investigating deadlocks (#17187)
Break grabLockOrStop into two pieces to facilitate investigating deadlocks.  Without this change, the "grab" goroutine looks the same regardless of who was calling grabLockOrStop, so there's no way to identify one of the deadlock parties.
2022-09-20 11:03:16 -04:00
Nick Cabatoff
24c9b42f8c
Do not attempt to write a new TLS keyring at startup if raft is already setup (#17079) 2022-09-09 12:19:57 -04:00
Nick Cabatoff
b082f16663
autopilot: assume nodes we haven't received heartbeats from are running the same version as we are (#17019)
OSS parts of ent PR #3172: assume nodes we haven't received heartbeats from are running the same version as we are.  Failing to provide a version/upgrade_version will result in Autopilot (on ent) demoting those unversioned nodes to non-voters until we receive a heartbeat from them.
2022-09-06 14:49:04 -04:00
Scott Miller
0d6a42c79e
OSS portion of wrapper-v2 (#16811)
* OSS portion of wrapper-v2

* Prefetch barrier type to avoid encountering an error in the simple BarrierType() getter

* Rename the OveriddenType to WrapperType and use it for the barrier type prefetch

* Fix unit test
2022-08-23 15:37:16 -04:00
Mike Palmiotto
dc5b396b14
Vault 7338/fix retry join (#16550)
* storage/raft: Fix cluster init with retry_join

Commit 8db66f4853abce3f432adcf1724b1f237b275415 introduced an error
wherein a join() would return nil (no error) with no information on its
channel if a joining node had been initialized. This was not handled
properly by the caller and resulted in a canceled `retry_join`.

Fix this by handling the `nil` channel respone by treating it as an
error and allowing the existing mechanics to work as intended.

* storage/raft: Improve retry_join go test

* storage/raft: Make VerifyRaftPeers pollable

* storage/raft: Add changelog entry for retry_join fix

* storage/raft: Add description to VerifyRaftPeers
2022-08-03 20:44:57 -05:00
Mike Palmiotto
3f02dfb856
storage/raft: Make raftInfo atomic (#16565)
* storage/raft: Make raftInfo atomic

This fixes some racy behavior discovered in parallel testing. Change the
core struct member to an atomic and update references throughout.
2022-08-03 18:40:49 -04:00
Mike Palmiotto
cc6409222c
Vault 6773/raft rejoin nonvoter (#16324)
* raft: Ensure init before setting suffrage

As reported in https://hashicorp.atlassian.net/browse/VAULT-6773:

	The /sys/storage/raft/join endpoint is intended to be unauthenticated. We rely
	on the seal to manage trust.

	It’s possible to use multiple join requests to switch nodes from voter to
	non-voter. The screenshot shows a 3 node cluster where vault_2 is the leader,
	and vault_3 and vault_4 are followers with non-voters set to false.  sent two
	requests to the raft join endpoint to have vault_3 and vault_4 join the cluster
	with non_voters:true.

This commit fixes the issue by delaying the call to SetDesiredSuffrage until after
the initialization check, preventing unauthenticated mangling of voter status.

Tested locally using
https://github.com/hashicorp/vault-tools/blob/main/users/ncabatoff/cluster/raft.sh
and the reproducer outlined in VAULT-6773.

* raft: Return join err on failure

This is necessary to correctly distinguish errors returned from the Join
workflow. Previously, errors were being masked as timeouts.

* raft: Default autopilot parameters in teststorage

Change some defaults so we don't have to pass in parameters or set them
in the originating tests. These storage types are only used in two
places:

1) Raft HA testing
2) Seal migration testing

Both consumers have been tested and pass with this change.

* changelog: Unauthn voter status change bugfix
2022-07-18 14:37:12 -04:00
Chris Capurso
3f9dbabfc1
Add endpoints to provide ability to modify logging verbosity (#16111)
* add func to set level for specific logger

* add endpoints to modify log level

* initialize base logger with IndependentLevels

* test to ensure other loggers remain unchanged

* add DELETE loggers endpoints to revert back to config

* add API docs page

* add changelog entry

* remove extraneous line

* add log level field to Core struct

* add godoc for getLogLevel

* add some loggers to c.allLoggers
2022-06-27 11:39:53 -04:00
Josh Black
de99f93820
Add autopilot automated upgrades and redundancy zones (#15521) 2022-05-20 16:49:11 -04:00
Violet Hynes
caa10a58a3
VAULT-4306 Ensure /raft/bootstrap/challenge call ignores erroneous namespaces set (#15519)
* VAULT-4306 Ensure /raft/bootstrap/challenge call ignores erroneous namespaces set

* VAULT-4306 Add changelog

* VAULT-4306 Update changelog/15519.txt

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2022-05-19 16:27:51 -04:00
Chris Capurso
b2b2ead938
fix raft tls key rotation panic when rotation time in past (#15156)
* fix raft tls key rotation panic when rotation time in past

* add changelog entry

* push out next raft TLS rotation time in case close to elapsing

* consolidate tls key rotation duration calculation

* reduce raft getNextRotationTime padding to 10 seconds

* move tls rotation ticker reset to where its duration is calculated
2022-04-25 21:48:34 -04:00
hghaf099
b1012f2e36
VAULT-4240 time.After() in a select statement can lead to memory leak (#14814)
* VAULT-4240 time.After() in a select statement can lead to memory leak

* CL
2022-04-01 10:17:11 -04:00
Pratyoy Mukhopadhyay
130ef5574c
Fix raft paralle retry bug (#14303) 2022-02-28 10:38:34 -08:00
Nick Cabatoff
7e74bebea8
Parallel retry join (#13606) 2022-01-17 10:33:03 -05:00
Jim Kalafut
a72a5ff754
Rename master key to root key (#13324)
* See what it looks like to replace "master key" with "root key".  There are two places that would require more challenging code changes: the storage path `core/master`, and its contents (the JSON-serialized EncodedKeyringtructure.)

* Restore accidentally deleted line

* Add changelog

* Update root->recovery

* Fix test

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2021-12-06 17:12:20 -08:00
Daniel Kimsey
e79b35287b
Auto-join support for IPv6 discovery (#12366)
* Auto-join support for IPv6 discovery

The go-discover library returns IP addresses and not URLs. It just so
happens net.URL parses "127.0.0.1", which isn't a valid URL.

Instead, we construct the URL ourselves. Being careful to check if it's
an ipv6 address and making sure it's in explicit form if so.

Fixes #12323

* feedback: addrs & ipv6 test

Rename addrs to clusterIPs to improve clarity and intent

Tighten up our IPv6 address detection to be more correct and to ensure
it's actually in implicit form
2021-09-07 11:55:07 -07:00
Jeff Mitchell
861454e0ed
Migrate to sdk/internalshared libs in go-secure-stdlib (#12090)
* Swap sdk/helper libs to go-secure-stdlib

* Migrate to go-secure-stdlib reloadutil

* Migrate to go-secure-stdlib kv-builder

* Migrate to go-secure-stdlib gatedwriter
2021-07-15 20:17:31 -04:00
Vishal Nayak
31775b63ca
OSS parts of Autopilot in DR secondaries (#12014) 2021-07-08 12:30:01 -04:00
Nick Cabatoff
885320f60c
VAULT-2439: OSS parts of #1889 (raft licensing init) (#11665) 2021-05-19 16:07:58 -04:00
Brian Kassouf
4f6f32642a
Reload raft TLS keys on active startup (#11660) 2021-05-19 10:03:32 -07:00
Lars Lehtonen
d10e912ec3
vault: deprecate errwrap.Wrapf() (#11577) 2021-05-11 13:12:54 -04:00
Josh Black
795ce10c6a
Add HTTP response headers for hostname and raft node ID (if applicable) (#11289) 2021-04-20 15:25:04 -07:00
Vishal Nayak
c8870314c1
Support autopilot when raft is for HA only (#11260) 2021-04-12 09:33:21 -04:00
Nick Cabatoff
c7b4ccf8f3
Fix: leader_tls_servername raft option only worked when used with mTLS and/or an explicit CA cert. (#11252) 2021-04-06 09:16:54 -04:00
Nick Cabatoff
e31ef153ca
Disable autopilot in raft-ha mode. (#11181)
* Disable autopilot in raft-ha mode.

* Also don't run autopilot on DR secondaries.
2021-03-23 14:13:44 -07:00
Vishal Nayak
415890e79c
Autopilot: Server Stabilization, State and Dead Server Cleanup (#10856)
* k8s doc: update for 0.9.1 and 0.8.0 releases (#10825)

* k8s doc: update for 0.9.1 and 0.8.0 releases

* Update website/content/docs/platform/k8s/helm/configuration.mdx

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>

* Autopilot initial commit

* Move autopilot related backend implementations to its own file

* Abstract promoter creation

* Add nil check for health

* Add server state oss no-ops

* Config ext stub for oss

* Make way for non-voters

* s/health/state

* s/ReadReplica/NonVoter

* Add synopsis and description

* Remove struct tags from AutopilotConfig

* Use var for config storage path

* Handle nin-config when reading

* Enable testing autopilot by using inmem cluster

* First passing test

* Only report the server as known if it is present in raft config

* Autopilot defaults to on for all existing and new clusters

* Add locking to some functions

* Persist initial config

* Clarify the command usage doc

* Add health metric for each node

* Fix audit logging issue

* Don't set DisablePerformanceStandby to true in test

* Use node id label for health metric

* Log updates to autopilot config

* Less aggressively consume config loading failures

* Return a mutable config

* Return early from known servers if raft config is unable to be pulled

* Update metrics name

* Reduce log level for potentially noisy log

* Add knob to disable autopilot

* Don't persist if default config is in use

* Autopilot: Dead server cleanup (#10857)

* Dead server cleanup

* Initialize channel in any case

* Fix a bunch of tests

* Fix panic

* Add follower locking in heartbeat tracker

* Add LastContactFailureThreshold to config

* Add log when marking node as dead

* Update follower state locking in heartbeat tracker

* Avoid follower states being nil

* Pull test to its own file

* Add execution status to state response

* Optionally enable autopilot in some tests

* Updates

* Added API function to fetch autopilot configuration

* Add test for default autopilot configuration

* Configuration tests

* Add State API test

* Update test

* Added TestClusterOptions.PhysicalFactoryConfig

* Update locking

* Adjust locking in heartbeat tracker

* s/last_contact_failure_threshold/left_server_last_contact_threshold

* Add disabling autopilot as a core config option

* Disable autopilot in some tests

* s/left_server_last_contact_threshold/dead_server_last_contact_threshold

* Set the lastheartbeat of followers to now when setting up active node

* Don't use config defaults from CLI command

* Remove config file support

* Remove HCL test as well

* Persist only supplied config; merge supplied config with default to operate

* Use pointer to structs for storing follower information

* Test update

* Retrieve non voter status from configbucket and set it up when a node comes up

* Manage desired suffrage

* Consider bucket being created already

* Move desired suffrage to its own entry

* s/DesiredSuffrageKey/LocalNodeConfigKey

* s/witnessSuffrage/recordSuffrage

* Fix test compilation

* Handle local node config post a snapshot install

* Commit to storage first; then record suffrage in fsm

* No need of local node config being nili case, post snapshot restore

* Reconcile autopilot config when a new leader takes over duty

* Grab fsm lock when recording suffrage

* s/Suffrage/DesiredSuffrage in FollowerState

* Instantiate autopilot only in leader

* Default to old ways in more scenarios

* Make API gracefully handle 404

* Address some feedback

* Make IsDead an atomic.Value

* Simplify follower hearbeat tracking

* Use uber.atomic

* Don't have multiple causes for having autopilot disabled

* Don't remove node from follower states if we fail to remove the dead server

* Autopilot server removals map (#11019)

* Don't remove node from follower states if we fail to remove the dead server

* Use map to track dead server removals

* Use lock and map

* Use delegate lock

* Adjust when to remove entry from map

* Only hold the lock while accessing map

* Fix race

* Don't set default min_quorum

* Fix test

* Ensure follower states is not nil before starting autopilot

* Fix race

Co-authored-by: Jason O'Donnell <2160810+jasonodonnell@users.noreply.github.com>
Co-authored-by: Theron Voran <tvoran@users.noreply.github.com>
2021-03-03 13:59:50 -05:00
Vishal Nayak
405eced084
Revert "Read-replica instead of non-voter (#10875)" (#10890)
This reverts commit fc745670cf.
2021-02-10 16:41:58 -05:00
Vishal Nayak
fc745670cf
Read-replica instead of non-voter (#10875) 2021-02-10 09:58:18 -05:00
Nick Cabatoff
cac2e00f2f
Add configuration to specify a TLS ServerName to use in the TLS handshake when performing a raft join. (#10698) 2021-01-19 17:54:28 -05:00
Nick Cabatoff
9459932e06
Be consistent with how we report init status. (#10498)
Also make half-joined raft peers consider storage to be initialized, whether or not they're sealed.
2020-12-08 13:55:34 -05:00
Aleksandr Bezobchuk
e4421691da
Merge PR #10192: Auto-Join: Configurable Scheme & Port (and add k8s provider) 2020-10-23 16:13:09 -04:00
Aleksandr Bezobchuk
b67da26279
Merge PR #10095: Integrated Storage Cloud Auto-Join 2020-10-13 16:26:39 -04:00
Brian Kassouf
7549070cfa
raft: Fix some snapshot restore issues (#9533)
* raft: Remove double read lock

* Reload TLS keyring after reloading the barrier keys
2020-07-21 10:59:07 -07:00
ncabatoff
c7b0a40273
Make standbyStopCh atomic to avoid data races (#9539) 2020-07-21 08:34:07 -04:00
Calvin Leung Huang
045836da71
raft: add support for using backend for ha_storage (#9193)
* raft: initial work on raft ha storage support

* add note on join

* add todo note

* raft: add support for bootstrapping and joining existing nodes

* raft: gate bootstrap join by reading leader api address from storage

* raft: properly check for raft-only for certain conditionals

* raft: add bootstrap to api and cli

* raft: fix bootstrap cli command

* raft: add test for setting up new cluster with raft HA

* raft: extend TestRaft_HA_NewCluster to include inmem and consul backends

* raft: add test for updating an existing cluster to use raft HA

* raft: remove debug log lines, clean up verifyRaftPeers

* raft: minor cleanup

* raft: minor cleanup

* Update physical/raft/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/ha.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/ha.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/logical_system_raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* address feedback comments

* address feedback comments

* raft: refactor tls keyring logic

* address feedback comments

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* address feedback comments

* testing: fix import ordering

* raft: rename var, cleanup comment line

* docs: remove ha_storage restriction note on raft

* docs: more raft HA interaction updates with migration and recovery mode

* docs: update the raft join command

* raft: update comments

* raft: add missing isRaftHAOnly check for clearing out state set earlier

* raft: update a few ha_storage config checks

* Update command/operator_raft_bootstrap.go

Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>

* raft: address feedback comments

* raft: fix panic when checking for config.HAStorage.Type

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update website/pages/docs/commands/operator/raft.mdx

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* raft: remove bootstrap cli command

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* raft: address review feedback

* raft: revert vendored sdk

* raft: don't send applied index and node ID info if we're HA-only

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
2020-06-23 12:04:13 -07:00
Brian Kassouf
66d6fa210d
storage/raft: Advertise the configured cluster address (#9008)
* storage/raft: Advertise the configured cluster address

* Don't allow raft to start with unspecified IP

* Fix concurrent map write panic

* Add test file

* changelog++

* changelog++

* changelog++

* Update tcp_layer.go

* Update tcp_layer.go

* Only set the adverise addr if set
2020-05-18 18:22:25 -07:00