1
0
Commit Graph

77 Commits

Author SHA1 Message Date
Josh Black
63ba1cd645
CE changes for https://github.com/hashicorp/vault-enterprise/pull/5695 (#26449) (#26455) 2024-04-16 14:36:25 -07:00
hc-github-team-es-release-engineering
48ab1eae08
[DO NOT MERGE UNTIL EOY] EOY license fixes 1.14.x (#24390) 2024-01-02 10:36:20 -08:00
hc-github-team-secure-vault-core
44fdf3b98d
backport of commit c329ed8d3b02b92dfded30065317c82648d3cae3 (#24260)
Co-authored-by: Steven Clark <steven.clark@hashicorp.com>
2023-11-27 16:21:51 -05:00
hc-github-team-secure-vault-core
f8a29da29d
backport of commit 0fa36a36ae1b4842d96623eef0d20af5dea557c0 (#23443)
Co-authored-by: Paul Banks <pbanks@hashicorp.com>
2023-10-02 09:49:05 -07:00
hc-github-team-secure-vault-core
855d9fa153
backport of commit 5d97159f05e581c0e5f14be9e2e3f8ac3b733091 (#21886)
Co-authored-by: Tom Proctor <tomhjp@users.noreply.github.com>
2023-07-17 13:01:25 +00:00
Peter Wilson
ba33101192
updated Leader godoc comment to give a warning on possible deadlock (#20773) 2023-05-25 12:02:39 +00:00
Victor Rodriguez
2d7efaef97
Convert seal.Access struct into a interface (OSS) (#20510)
* Move seal barrier type field from Access to autoSeal struct.

Remove method Access.SetType(), which was only being used by a single test, and
which can use the name option of NewTestSeal() to specify the type.

* Change method signatures of Access to match those of Wrapper.

* Turn seal.Access struct into an interface.

* Tweak Access implementation.

Change `access` struct to have a field of type wrapping.Wrapper, rather than
extending it.

* Add method Seal.GetShamirWrapper().

Add method Seal.GetShamirWrapper() for use by code that need to perform
Shamir-specific operations.
2023-05-04 14:22:30 -04:00
Nick Cabatoff
e439289be5
Address regression introduced by #15493 for non-raft storage backends. (#19721) 2023-03-24 10:15:25 -04:00
Hamid Ghaf
e55c18ed12
adding copyright header (#19555)
* adding copyright header

* fix fmt and a test
2023-03-15 09:00:52 -07:00
miagilepner
13caa0842e
VAULT-8436 remove <-time.After statements in for loops (#18818)
* replace time.After with ticker in loops

* add semgrep rule

* update to use timers

* remove stop
2023-02-06 17:49:01 +01:00
Nick Cabatoff
ce74f4f1de
Add more raft metrics, emit more metrics on non-perf standbys (#12166)
Add some metrics helpful for monitoring raft cluster state.

Furthermore, we weren't emitting bolt metrics on regular (non-perf) standbys, and there were other metrics
in metricsLoop that would make sense to include in OSS but weren't.  We now have an active-node-only func,
emitMetricsActiveNode.  This runs metricsLoop on the active node.  Standbys and perf-standbys run metricsLoop
from a goroutine managed by the runStandby rungroup.
2022-10-07 09:09:08 -07:00
Josh Black
db71fdb087
only enable undo logs if all cluster members support it (#17378) 2022-10-06 11:24:16 -07:00
Nick Cabatoff
cbbf1a5593
Break grabLockOrStop into two pieces to facilitate investigating deadlocks (#17187)
Break grabLockOrStop into two pieces to facilitate investigating deadlocks.  Without this change, the "grab" goroutine looks the same regardless of who was calling grabLockOrStop, so there's no way to identify one of the deadlock parties.
2022-09-20 11:03:16 -04:00
Scott Miller
0d6a42c79e
OSS portion of wrapper-v2 (#16811)
* OSS portion of wrapper-v2

* Prefetch barrier type to avoid encountering an error in the simple BarrierType() getter

* Rename the OveriddenType to WrapperType and use it for the barrier type prefetch

* Fix unit test
2022-08-23 15:37:16 -04:00
Josh Black
073527549b
Correct drift between ENT and OSS (#15966) 2022-06-14 17:53:19 -07:00
Nick Cabatoff
106f548a41
Forward autopilot state reqs, avoid self-dialing (#15493)
Make sure that autopilot is disabled when we step down from active node state.  Forward autopilot state requests to the active node.  Avoid self-dialing due to stale advertisement.
2022-05-18 14:50:18 -04:00
Hridoy Roy
27f15edd9f
SSCT Tokens Feature [OSS] (#14109)
* port SSCT OSS

* port header hmac key to ent and generate token proto without make command

* remove extra nil check in request handling

* add changelog

* add comment to router.go

* change test var to use length constants

* remove local index is 0 check and extra defer which can be removed after use of ExternalID
2022-02-17 11:43:07 -08:00
Jim Kalafut
a72a5ff754
Rename master key to root key (#13324)
* See what it looks like to replace "master key" with "root key".  There are two places that would require more challenging code changes: the storage path `core/master`, and its contents (the JSON-serialized EncodedKeyringtructure.)

* Restore accidentally deleted line

* Add changelog

* Update root->recovery

* Fix test

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2021-12-06 17:12:20 -08:00
Nick Cabatoff
7679223da6
Reorganize request handling code so that we don't touch storage until we have the stateLock. (#11835) 2021-06-11 13:18:16 -04:00
Lars Lehtonen
d10e912ec3
vault: deprecate errwrap.Wrapf() (#11577) 2021-05-11 13:12:54 -04:00
Brian Kassouf
a24653cc5c
Run a more strict formatter over the code (#11312)
* Update tooling

* Run gofumpt

* go mod vendor
2021-04-08 09:43:39 -07:00
Brian Kassouf
9f5babf584
core: Record the time a node became active (#10489)
* core: Record the time a node became active

* Update vault/core.go

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>

* Add omitempty field

* Update vendor

* Added CL entry and fixed test

* Fix test

* Fix command package tests

Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2020-12-11 16:50:19 -08:00
Calvin Leung Huang
30eeac6501
ha: update godoc on grabLockOrStop (#10547) 2020-12-11 16:04:00 -08:00
Seth Bunce
0998d51950
fix deadlock on core state lock (#10456)
* fix race that can cause deadlock on core state lock

The bug is in the grabLockOrStop function. For specific concurrent
executions the grabLockOrStop function can return stopped=true when
the lock is still held. A comment in grabLockOrStop indicates that the
function is only used when the stateLock is held, but grabLockOrStop is
being used to acquire the stateLock. If there are concurrent goroutines
using grabLockOrStop then some concurrent executions result in
stopped=true being returned when the lock is acquired.

The fix is to add a lock and some state around which the parent and
child goroutine in the grabLockOrStop function can coordinate so that
the different concurrent executions can be handled.

This change includes a non-deterministic unit test which reliably
reproduces the problem before the fix.

* use rand instead of time for random test stopCh close

Using time.Now().UnixNano()%2 ends up being system dependent because
different operating systems and hardware have different clock
resolution. A lower resolution will return the same unix time for a
longer period of time.

It is better to avoid this issue by using a random number generator.
This change uses the rand package default random number generator. It's
generally good to avoid using the default random number generator,
because it creates extra lock contention. For a test it should be fine.
2020-12-10 06:50:11 -05:00
Nick Cabatoff
bbdb137a60
Fix race with test that mutates KeyRotateGracePeriod: make the global be a Core field instead. (#10512) 2020-12-08 13:57:44 -05:00
Hridoy Roy
8172b1d410
Port: change leader status metric name to active (#10245)
* change active node metric name

* comment to see if commit is fine

Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MacBook-Pro.local>
2020-10-29 10:30:45 -07:00
Jeff Mitchell
ced73ab7bf
Consolidate locking for sys/health (#9876)
* Consolidate locking for sys/health

This avoids a second state lock read-lock on every sys/health hit

* Address review feedback

Co-authored-by: Vishal Nayak <vishalnayakv@gmail.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
2020-10-26 16:47:54 -04:00
Hridoy Roy
a92e410ca9
Backport leader status telemetry [VAULT-672] (#10147)
* backport VAULT-672

* backport VAULT-672

* go mod tidy

* go mod tidy

* add back indirect import

* replace go mod and go sum with master version

* go mod vendor

* more go mod vendor

Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MBP.hitronhub.home>
Co-authored-by: Hridoy Roy <hridoyroy@Hridoys-MacBook-Pro.local>
2020-10-15 14:15:58 -07:00
Calvin Leung Huang
045836da71
raft: add support for using backend for ha_storage (#9193)
* raft: initial work on raft ha storage support

* add note on join

* add todo note

* raft: add support for bootstrapping and joining existing nodes

* raft: gate bootstrap join by reading leader api address from storage

* raft: properly check for raft-only for certain conditionals

* raft: add bootstrap to api and cli

* raft: fix bootstrap cli command

* raft: add test for setting up new cluster with raft HA

* raft: extend TestRaft_HA_NewCluster to include inmem and consul backends

* raft: add test for updating an existing cluster to use raft HA

* raft: remove debug log lines, clean up verifyRaftPeers

* raft: minor cleanup

* raft: minor cleanup

* Update physical/raft/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/ha.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/ha.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/logical_system_raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* address feedback comments

* address feedback comments

* raft: refactor tls keyring logic

* address feedback comments

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* address feedback comments

* testing: fix import ordering

* raft: rename var, cleanup comment line

* docs: remove ha_storage restriction note on raft

* docs: more raft HA interaction updates with migration and recovery mode

* docs: update the raft join command

* raft: update comments

* raft: add missing isRaftHAOnly check for clearing out state set earlier

* raft: update a few ha_storage config checks

* Update command/operator_raft_bootstrap.go

Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>

* raft: address feedback comments

* raft: fix panic when checking for config.HAStorage.Type

* Update vault/raft.go

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* Update website/pages/docs/commands/operator/raft.mdx

Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>

* raft: remove bootstrap cli command

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* Update vault/raft.go

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>

* raft: address review feedback

* raft: revert vendored sdk

* raft: don't send applied index and node ID info if we're HA-only

Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
Co-authored-by: Alexander Bezobchuk <alexanderbez@users.noreply.github.com>
Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>
2020-06-23 12:04:13 -07:00
Jeff Mitchell
91b09c09b5
Create configutil and move some common config and setup functions there (#8362) 2020-05-14 09:19:27 -04:00
ncabatoff
923387fbed
Unless we've been asked to stop, most failures should not result in (#7732)
waitForLeadership returning.
2020-02-14 18:28:37 -08:00
Vishal Nayak
9f980ade31
Seal migration with Raft (#8103)
* Seal migration after unsealing

* Refactor migration fields migrationInformation in core

* Perform seal migration as part of postUnseal

* Remove the sleep logic

* Use proper seal in the unseal function

* Fix migration from Auto to Shamir

* Fix the recovery config missing issue

* Address the non-ha migration case

* Fix the multi cluster case

* Avoid re-running seal migration

* Run the post migration code in new leaders

* Fix the issue of wrong recovery being set

* Address review feedback

* Add more complete testing coverage for seal migrations.   (#8247)

* Add more complete testing coverage for seal migrations.  Also remove VAULT_ACC gate from some tests that just depend on docker, cleanup dangling recovery config in storage after migration, and fix a call in adjustCoreForSealMigration that seems broken.

* Fix the issue of wrong recovery key being set

* Adapt tests to work with multiple cores.

* Add missing line to disable raft join.

Co-authored-by: Vishal Nayak <vishalnayak@users.noreply.github.com>

* Fix all known issues

* Remove warning

* Review feedback.

* Revert my previous change that broke raft tests.  We'll need to come back and at least comment
this once we better understand why it's needed.

* Don't allow migration between same types for now

* Disable auto to auto tests for now since it uses migration between same types which is not allowed

* Update vault/core.go

Co-Authored-By: Brian Kassouf <briankassouf@users.noreply.github.com>

* Add migration logs

* Address review comments

* Add the recovery config check back

* Skip a few steps if migration is already done

* Return from waitForLeadership if migration fails

Co-authored-by: ncabatoff <nick.cabatoff@gmail.com>
Co-authored-by: Brian Kassouf <briankassouf@users.noreply.github.com>
2020-02-13 16:27:31 -05:00
Becca Petrin
d7d4084c86
Observer pattern for service registration interface (#8123)
* use observer pattern for service discovery

* update perf standby method

* fix test

* revert usersTags to being called serviceTags

* use previous consul code

* vault isnt a performance standby before starting

* log err

* changes from feedback

* add Run method to interface

* changes from feedback

* fix core test

* update example
2020-01-24 09:42:03 -08:00
Jeff Mitchell
157e805b97
Migrate built in auto seal to go-kms-wrapping (#8118) 2020-01-10 20:39:52 -05:00
Mike Jarmy
df01a4307d
Introduce optional service_registration stanza (#7887)
* move ServiceDiscovery into methods

* add ServiceDiscoveryFactory

* add serviceDiscovery field to vault.Core

* refactor ConsulServiceDiscovery into separate struct

* cleanup

* revert accidental change to go.mod

* cleanup

* get rid of un-needed struct tags in vault.CoreConfig

* add service_discovery parser

* add ServiceDiscovery to config

* cleanup

* cleanup

* add test for ConfigServiceDiscovery to Core

* unit testing for config service_discovery stanza

* cleanup

* get rid of un-needed redirect_addr stuff in service_discovery stanza

* improve test suite

* cleanup

* clean up test a bit

* create docs for service_discovery

* check if service_discovery is configured, but storage does not support HA

* tinker with test

* tinker with test

* tweak docs

* move ServiceDiscovery into its own package

* tweak a variable name

* fix comment

* rename service_discovery to service_registration

* tweak service_registration config

* Revert "tweak service_registration config"

This reverts commit 5509920a8ab4c5a216468f262fc07c98121dce35.

* simplify naming

* refactor into ./serviceregistration/consul
2019-12-06 09:46:39 -05:00
Jim Kalafut
cb178b7e4f
Run go fmt (#7823) 2019-11-07 08:54:34 -08:00
ncabatoff
afcba41190
Shamir seals now come in two varieties: legacy and new-style. (#7694)
Shamir seals now come in two varieties: legacy and new-style. Legacy
Shamir is automatically converted to new-style when a rekey operation
is performed. All new Vault initializations using Shamir are new-style.

New-style Shamir writes an encrypted master key to storage, just like
AutoUnseal. The stored master key is encrypted using the shared key that
is split via Shamir's algorithm. Thus when unsealing, we take the key
fragments given, combine them into a Key-Encryption-Key, and use that
to decrypt the master key on disk. Then the master key is used to read
the keyring that decrypts the barrier.
2019-10-18 14:46:00 -04:00
Brian Kassouf
8e93f59035
core: Don't shutdown if key upgrades fail due to canceled context (#7070)
* core: Don't shutdown if key upgrades fail due to canceled context

* Continue if we are not shutting down
2019-07-05 14:19:15 -07:00
Brian Kassouf
9681121b34
storage/raft: fix races in tests (#6996)
* storage/raft: fix races in tests

* Fix another test race
2019-06-27 10:00:03 -07:00
Jeff Mitchell
a342dcbb29 Sync 2019-06-20 20:55:10 -04:00
Brian Kassouf
b435028f3f
Raft Storage Backend (#6888)
* Work on raft backend

* Add logstore locally

* Add encryptor and unsealable interfaces

* Add clustering support to raft

* Remove client and handler

* Bootstrap raft on init

* Cleanup raft logic a bit

* More raft work

* Work on TLS config

* More work on bootstrapping

* Fix build

* More work on bootstrapping

* More bootstrapping work

* fix build

* Remove consul dep

* Fix build

* merged oss/master into raft-storage

* Work on bootstrapping

* Get bootstrapping to work

* Clean up FMS and node-id

* Update local node ID logic

* Cleanup node-id change

* Work on snapshotting

* Raft: Add remove peer API (#906)

* Add remove peer API

* Add some comments

* Fix existing snapshotting (#909)

* Raft get peers API (#912)

* Read raft configuration

* address review feedback

* Use the Leadership Transfer API to step-down the active node (#918)

* Raft join and unseal using Shamir keys (#917)

* Raft join using shamir

* Store AEAD instead of master key

* Split the raft join process to answer the challenge after a successful unseal

* get the follower to standby state

* Make unseal work

* minor changes

* Some input checks

* reuse the shamir seal access instead of new default seal access

* refactor joinRaftSendAnswer function

* Synchronously send answer in auto-unseal case

* Address review feedback

* Raft snapshots (#910)

* Fix existing snapshotting

* implement the noop snapshotting

* Add comments and switch log libraries

* add some snapshot tests

* add snapshot test file

* add TODO

* More work on raft snapshotting

* progress on the ConfigStore strategy

* Don't use two buckets

* Update the snapshot store logic to hide the file logic

* Add more backend tests

* Cleanup code a bit

* [WIP] Raft recovery (#938)

* Add recovery functionality

* remove fmt.Printfs

* Fix a few fsm bugs

* Add max size value for raft backend (#942)

* Add max size value for raft backend

* Include physical.ErrValueTooLarge in the message

* Raft snapshot Take/Restore API  (#926)

* Inital work on raft snapshot APIs

* Always redirect snapshot install/download requests

* More work on the snapshot APIs

* Cleanup code a bit

* On restore handle special cases

* Use the seal to encrypt the sha sum file

* Add sealer mechanism and fix some bugs

* Call restore while state lock is held

* Send restore cb trigger through raft log

* Make error messages nicer

* Add test helpers

* Add snapshot test

* Add shamir unseal test

* Add more raft snapshot API tests

* Fix locking

* Change working to initalize

* Add underlying raw object to test cluster core

* Move leaderUUID to core

* Add raft TLS rotation logic (#950)

* Add TLS rotation logic

* Cleanup logic a bit

* Add/Remove from follower state on add/remove peer

* add comments

* Update more comments

* Update request_forwarding_service.proto

* Make sure we populate all nodes in the followerstate obj

* Update times

* Apply review feedback

* Add more raft config setting (#947)

* Add performance config setting

* Add more config options and fix tests

* Test Raft Recovery (#944)

* Test raft recovery

* Leave out a node during recovery

* remove unused struct

* Update physical/raft/snapshot_test.go

* Update physical/raft/snapshot_test.go

* fix vendoring

* Switch to new raft interface

* Remove unused files

* Switch a gogo -> proto instance

* Remove unneeded vault dep in go.sum

* Update helper/testhelpers/testhelpers.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* Update vault/cluster/cluster.go

* track active key within the keyring itself (#6915)

* track active key within the keyring itself

* lookup and store using the active key ID

* update docstring

* minor refactor

* Small text fixes (#6912)

* Update physical/raft/raft.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* review feedback

* Move raft logical system into separate file

* Update help text a bit

* Enforce cluster addr is set and use it for raft bootstrapping

* Fix tests

* fix http test panic

* Pull in latest raft-snapshot library

* Add comment
2019-06-20 12:14:58 -07:00
Jeff Mitchell
4184e72025
Fix a deadlock if a panic happens during request handling (#6920)
* Fix a deadlock if a panic happens during request handling

During request handling, if a panic is created, deferred functions are
run but otherwise execution stops. #5889 changed some locks to
non-defers but had the side effect of causing the read lock to not be
released if the request panicked. This fixes that and addresses a few
other potential places where things could go wrong:

1) In sealInitCommon we always now defer a function that unlocks the
read lock if it hasn't been unlocked already
2) In StepDown we defer the RUnlock but we also had two error cases that
were calling it manually. These are unlikely to be hit but if they were
I believe would cause a panic.

* Add panic recovery test
2019-06-19 09:40:57 -04:00
ncabatoff
6c836bcd9b
Allow plugins to submit audit requests/responses via extended SystemView (#6777)
Move audit.LogInput to sdk/logical.  Allow the Data values in audited
logical.Request and Response to implement OptMarshaler, in which case
we delegate hashing/serializing responsibility to them.  Add new
ClientCertificateSerialNumber audit request field.

SystemView can now be cast to ExtendedSystemView to expose the Auditor
interface, which allows submitting requests and responses to the audit
broker.
2019-05-22 18:52:53 -04:00
EdwinRobbins
fdcf68db67 DynamoDB: Make Unlock key delete conditional on being old leader's (#6637) 2019-04-30 17:44:47 -07:00
Jeff Mitchell
278bdd1f4e
Switch to go modules (#6585)
* Switch to go modules

* Make fmt
2019-04-13 03:44:06 -04:00
Jeff Mitchell
170521481d
Create sdk/ and api/ submodules (#6583) 2019-04-12 17:54:35 -04:00
Brian Kassouf
3cc5d05b6a
Fix perf standby elections when the new active node was also the previous active node (#6561) 2019-04-10 10:09:36 -07:00
Jeff Mitchell
7fa72f3321 A few more syncs 2019-03-04 13:53:15 -05:00
Brian Kassouf
9fc2d79812
Refactor the cluster listener (#6232)
* Port over OSS cluster port refactor components

* Start forwarding

* Cleanup a bit

* Fix copy error

* Return error from perf standby creation

* Add some more comments

* Fix copy/paste error
2019-02-14 18:14:56 -08:00
Jeff Mitchell
9a980d34c7
Fix leader info repopulation (#6167)
* Two things:

* Change how we populate and clear leader UUID. This fixes a case where
if a standby disconnects from an active node and reconnects, without the
active node restarting, the UUID doesn't change so triggers on a new
active node don't get run.

* Add a bunch of test helpers and minor updates to things.
2019-02-05 21:01:18 -05:00