1
0
Commit Graph

73 Commits

Author SHA1 Message Date
hc-github-team-es-release-engineering
48ab1eae08
[DO NOT MERGE UNTIL EOY] EOY license fixes 1.14.x (#24390) 2024-01-02 10:36:20 -08:00
hc-github-team-secure-vault-core
0ca00475cd
backport of commit c67242463c239215a1dbf3b9979787a5f8359bbf (#20830)
Co-authored-by: Nick Cabatoff <ncabatoff@hashicorp.com>
2023-05-29 15:02:27 +00:00
Nick Cabatoff
0be6924452
Make sure that we specify Backoff in conjunction with MinConnectTimeout, else we get a zero value. (#19701) 2023-03-23 10:21:28 -04:00
Nick Cabatoff
0de436bda4
Allow overriding gRPC's connection timeout with VAULT_GRPC_MIN_CONNECT_TIMEOUT (#19676) 2023-03-22 18:51:37 +00:00
Hamid Ghaf
e55c18ed12
adding copyright header (#19555)
* adding copyright header

* fix fmt and a test
2023-03-15 09:00:52 -07:00
Josh Black
de99f93820
Add autopilot automated upgrades and redundancy zones (#15521) 2022-05-20 16:49:11 -04:00
hghaf099
b435f09f8b
Fix a Deadlock on HA leadership transfer (#12691)
* Fix a Deadlock on HA leadership transfer when standby
was actively forwarding a request
fixes GH #12601

* adding the changelog
2021-10-04 13:55:15 -04:00
Nick Cabatoff
4157ed4730
Add metrics for requests forwarded by standbys. (#11366) 2021-04-16 14:02:20 -04:00
Nick Cabatoff
ad169e2b84
When a standby does a ForwardRequest, it's not using the request context, and thus not getting timed out properly when it takes too long. (#11322)
The rpcClientConnContext is still used to terminate gRPC internal/dialer-related goroutines, but the actual RPC is now timed out when the request times out, e.g. due to the default max request duration.  This mirrors what we do with the parallel forwarding code in ENT.
2021-04-15 10:23:26 -04:00
ncabatoff
4f109368a6
Eliminate global that caused race tests to fail in ent with an internal config setting. (#9604) 2020-07-27 16:10:26 -04:00
ncabatoff
b95822b713
Use the accessor method so state lock is used to check perf standby status. (#9496) 2020-07-20 10:34:16 -04:00
ncabatoff
58bb7a1578
Fix a test failure I observed on ent re cluster listener (#8647)
Panics when the cluster listener changes while we're setting up request forwarding.
2020-03-31 13:47:39 -04:00
Brian Kassouf
23c83c6d11
Create network layer abstraction to allow in-memory cluster traffic (#8173) 2020-01-16 23:03:02 -08:00
Brian Kassouf
aca342ff04
Port OSS changes from perf standby fix (#7818)
* Port OSS changes from perf standby fix

* Fix build
2019-11-06 14:36:47 -08:00
ncabatoff
a9a174d9b1
Make clusterListener an atomic.Value to avoid races with getGRPCDialer. (#7408) 2019-09-03 11:59:56 -04:00
ncabatoff
e03ea41f01
Close and flush perf standby conns/cache when sealing. (#7183) 2019-07-24 16:32:57 -04:00
Brian Kassouf
9681121b34
storage/raft: fix races in tests (#6996)
* storage/raft: fix races in tests

* Fix another test race
2019-06-27 10:00:03 -07:00
Brian Kassouf
b435028f3f
Raft Storage Backend (#6888)
* Work on raft backend

* Add logstore locally

* Add encryptor and unsealable interfaces

* Add clustering support to raft

* Remove client and handler

* Bootstrap raft on init

* Cleanup raft logic a bit

* More raft work

* Work on TLS config

* More work on bootstrapping

* Fix build

* More work on bootstrapping

* More bootstrapping work

* fix build

* Remove consul dep

* Fix build

* merged oss/master into raft-storage

* Work on bootstrapping

* Get bootstrapping to work

* Clean up FMS and node-id

* Update local node ID logic

* Cleanup node-id change

* Work on snapshotting

* Raft: Add remove peer API (#906)

* Add remove peer API

* Add some comments

* Fix existing snapshotting (#909)

* Raft get peers API (#912)

* Read raft configuration

* address review feedback

* Use the Leadership Transfer API to step-down the active node (#918)

* Raft join and unseal using Shamir keys (#917)

* Raft join using shamir

* Store AEAD instead of master key

* Split the raft join process to answer the challenge after a successful unseal

* get the follower to standby state

* Make unseal work

* minor changes

* Some input checks

* reuse the shamir seal access instead of new default seal access

* refactor joinRaftSendAnswer function

* Synchronously send answer in auto-unseal case

* Address review feedback

* Raft snapshots (#910)

* Fix existing snapshotting

* implement the noop snapshotting

* Add comments and switch log libraries

* add some snapshot tests

* add snapshot test file

* add TODO

* More work on raft snapshotting

* progress on the ConfigStore strategy

* Don't use two buckets

* Update the snapshot store logic to hide the file logic

* Add more backend tests

* Cleanup code a bit

* [WIP] Raft recovery (#938)

* Add recovery functionality

* remove fmt.Printfs

* Fix a few fsm bugs

* Add max size value for raft backend (#942)

* Add max size value for raft backend

* Include physical.ErrValueTooLarge in the message

* Raft snapshot Take/Restore API  (#926)

* Inital work on raft snapshot APIs

* Always redirect snapshot install/download requests

* More work on the snapshot APIs

* Cleanup code a bit

* On restore handle special cases

* Use the seal to encrypt the sha sum file

* Add sealer mechanism and fix some bugs

* Call restore while state lock is held

* Send restore cb trigger through raft log

* Make error messages nicer

* Add test helpers

* Add snapshot test

* Add shamir unseal test

* Add more raft snapshot API tests

* Fix locking

* Change working to initalize

* Add underlying raw object to test cluster core

* Move leaderUUID to core

* Add raft TLS rotation logic (#950)

* Add TLS rotation logic

* Cleanup logic a bit

* Add/Remove from follower state on add/remove peer

* add comments

* Update more comments

* Update request_forwarding_service.proto

* Make sure we populate all nodes in the followerstate obj

* Update times

* Apply review feedback

* Add more raft config setting (#947)

* Add performance config setting

* Add more config options and fix tests

* Test Raft Recovery (#944)

* Test raft recovery

* Leave out a node during recovery

* remove unused struct

* Update physical/raft/snapshot_test.go

* Update physical/raft/snapshot_test.go

* fix vendoring

* Switch to new raft interface

* Remove unused files

* Switch a gogo -> proto instance

* Remove unneeded vault dep in go.sum

* Update helper/testhelpers/testhelpers.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* Update vault/cluster/cluster.go

* track active key within the keyring itself (#6915)

* track active key within the keyring itself

* lookup and store using the active key ID

* update docstring

* minor refactor

* Small text fixes (#6912)

* Update physical/raft/raft.go

Co-Authored-By: Calvin Leung Huang <cleung2010@gmail.com>

* review feedback

* Move raft logical system into separate file

* Update help text a bit

* Enforce cluster addr is set and use it for raft bootstrapping

* Fix tests

* fix http test panic

* Pull in latest raft-snapshot library

* Add comment
2019-06-20 12:14:58 -07:00
Brian Kassouf
7ec1fe75a9
Move cluster logic out of vault package (#6601)
* Move cluster logic out of vault package

* Dedup heartbeat and fix tests

* Fix test
2019-04-17 13:50:31 -07:00
Brian Kassouf
dd6b2a47dc
Port over some test fixes (#6261) 2019-02-19 12:03:02 -08:00
Brian Kassouf
9fc2d79812
Refactor the cluster listener (#6232)
* Port over OSS cluster port refactor components

* Start forwarding

* Cleanup a bit

* Fix copy error

* Return error from perf standby creation

* Add some more comments

* Fix copy/paste error
2019-02-14 18:14:56 -08:00
Jeff Mitchell
9a980d34c7
Fix leader info repopulation (#6167)
* Two things:

* Change how we populate and clear leader UUID. This fixes a case where
if a standby disconnects from an active node and reconnects, without the
active node restarting, the UUID doesn't change so triggers on a new
active node don't get run.

* Add a bunch of test helpers and minor updates to things.
2019-02-05 21:01:18 -05:00
Calvin Leung Huang
0b2350bc15
Logger cleanup (#5480) 2018-10-09 09:43:17 -07:00
Jeff Mitchell
b7d6d55ac1
The big one (#5346) 2018-09-17 23:03:00 -04:00
Becca Petrin
13887f0d33
undo make fmt (#5265) 2018-09-04 09:29:18 -07:00
Becca Petrin
6537b0a536
run make fmt (#5261) 2018-09-04 09:12:59 -07:00
Calvin Leung Huang
0a8be8f74d gofmt files (#5233) 2018-08-31 09:15:40 -07:00
Jeff Mitchell
1820110443
Pass in an ErrorLog to http.Server (#5135)
Fixes #5108
2018-08-21 11:23:18 -04:00
Brian Kassouf
431d8ad615
HA: Bump the max send/recv size for the gRPC server (#4844) 2018-06-29 09:52:23 -07:00
Jeff Mitchell
f493d2436e
Add an idle timeout for the server (#4760)
* Add an idle timeout for the server

Because tidy operations can be long-running, this also changes all tidy
operations to behave the same operationally (kick off the process, get a
warning back, log errors to server log) and makes them all run in a
goroutine.

This could mean a sort of hard stop if Vault gets sealed because the
function won't have the read lock. This should generally be okay
(running tidy again should pick back up where it left off), but future
work could use cleanup funcs to trigger the functions to stop.

* Fix up tidy test

* Add deadline to cluster connections and an idle timeout to the cluster server, plus add readheader/read timeout to api server
2018-06-16 18:21:33 -04:00
Jeff Mitchell
e1a89e0d55
Some atomic cleanup (#4732)
Taking inspiration from
https://github.com/golang/go/issues/17604#issuecomment-256384471
suggests that taking the address of a stack variable for use in atomics
works (at least, the race detector doesn't complain) but is doing it
wrong.

The only other change is a change in Leader() detecting if HA is enabled
to fast-path out. This value never changes after NewCore, so we don't
need to grab the read lock to check it.
2018-06-09 15:35:22 -04:00
Becca Petrin
792d219aa9 Move to "github.com/hashicorp/go-hclog" (#4227)
* logbridge with hclog and identical output

* Initial search & replace

This compiles, but there is a fair amount of TODO
and commented out code, especially around the
plugin logclient/logserver code.

* strip logbridge

* fix majority of tests

* update logxi aliases

* WIP fixing tests

* more test fixes

* Update test to hclog

* Fix format

* Rename hclog -> log

* WIP making hclog and logxi love each other

* update logger_test.go

* clean up merged comments

* Replace RawLogger interface with a Logger

* Add some logger names

* Replace Trace with Debug

* update builtin logical logging patterns

* Fix build errors

* More log updates

* update log approach in command and builtin

* More log updates

* update helper, http, and logical directories

* Update loggers

* Log updates

* Update logging

* Update logging

* Update logging

* Update logging

* update logging in physical

* prefixing and lowercase

* Update logging

* Move phyisical logging name to server command

* Fix som tests

* address jims feedback so far

* incorporate brians feedback so far

* strip comments

* move vault.go to logging package

* update Debug to Trace

* Update go-plugin deps

* Update logging based on review comments

* Updates from review

* Unvendor logxi

* Remove null_logger.go
2018-04-02 17:46:59 -07:00
Brian Kassouf
a46b996496 Port some replicated cluster changes from ent (#4037) 2018-02-23 14:01:15 -05:00
Jeff Mitchell
85c7b528e2
Change grpc's max sent/recv size to a very large value. (#3912) 2018-02-06 13:52:35 -05:00
Jeff Mitchell
bcaa6e3444
Fix intermittent panic by storing a reference to the grpc server (#3842)
* Fix intermittent panic by storing a reference to the grpc server and
using that to ensure it will never be nil.

* Just get rid of c.rpcServer
2018-01-24 20:23:08 -05:00
Jeff Mitchell
a109e2a11e Sync some bits over 2018-01-22 21:44:49 -05:00
Jeff Mitchell
e9d5863a2e
Use a separate var for active node replication state (#3819) 2018-01-19 19:24:04 -05:00
Jeff Mitchell
1e78dfb8ac Embed derived contexts into replication clients 2018-01-19 07:22:31 -05:00
Jeff Mitchell
8b0f8de8ff A bit more context plumbing 2018-01-19 04:11:59 -05:00
Jeff Mitchell
f0acb3a995 Actually print out forwarded stacktrace 2018-01-18 11:40:59 -05:00
Jeff Mitchell
e7480ae4c8 Rename var from last commit 2018-01-17 23:08:35 -05:00
Jeff Mitchell
11ac5fb484 Make heartbeatInterval a package var to allow tests to modify it 2018-01-17 23:05:11 -05:00
Jeff Mitchell
ba219f4917
Add replication state to EchoReply (#3810) 2018-01-17 22:17:47 -05:00
Brian Kassouf
2b7c90310e
Fix leaking connections on cluster port (#3680) 2017-12-12 17:18:04 -08:00
Jeff Mitchell
276a230bde
Move location of quit channel closing in exp manager (#3638)
* Move location of quit channel closing in exp manager

If it happens after stopping timers any timers firing before all timers
are stopped will still run the revocation function. With plugin
auto-crash-recovery this could end up instantiating a plugin that could
then try to unwrap a token from a nil token store.

This also plumbs in core so that we can grab a read lock during the
operation and check standby/sealed status before running it (after
grabbing the lock).

* Use context instead of checking core values directly

* Use official Go context in a few key places
2017-12-01 17:08:38 -05:00
Jeff Mitchell
91ef8ad02e Fix regression involving cluster listener 2017-11-07 17:27:13 -05:00
Jeff Mitchell
3e7a3acb22
Change some instances of adding headers to setting headers, since really (#3501)
we want to replace anything that might be there (e.g. for request
forwarding and content-type).

Hopefully fixes #3485
2017-11-02 07:31:50 -05:00
Jeff Mitchell
83f77d5a5f
Fix memory leak when a connection would hit the cluster port and go away (#3513) 2017-10-31 20:58:45 -05:00
Jeff Mitchell
07b97c5744 Sync 2017-10-23 14:59:37 -04:00
Jeff Mitchell
3f31ed733f Add option to set cluster TLS cipher suites. (#3228)
* Add option to set cluster TLS cipher suites.

Fixes #3227
2017-08-30 16:28:23 -04:00