카테고리 없음
WebClient 사용하면서 발생했던 에러 분석 및 해결 방안
ChrisJang
2024. 1. 19. 15:34
Spring Webflux의 WebClient를 사용하면서 발견한 에러 모음 및 해당 에러에 따른 fallback 입니다.
Error List
io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: internal_error
What is Error?
- io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: internal_errorio.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: internal_error at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)
Why happened?
-
curl -X GET --tlsv1.2 -Ikv \\ 'https://{mall_id}.cafe24api.com/api/v2/admin/customersprivacy?end_date=2024-01-15T07:15&date_type=login&limit=1000&fields=shop_no,member_id,name,cellphone,group_no,created_date,last_login_date,member_authentication,use_blacklist,blacklist_type,member_authority,gender,member_type,authentication_method,nationality,country_code&start_date=2024-01-15T06:15&shop_no=1' \\ -H 'Authorization: Bearer veUVPnB0ekJnq3365Mg2Pm8O' \\ -H 'Content-Type: application/json' \\ -H 'X-Cafe24-Api-Version: 2023-12-01' * Trying 211.42.136.107:443... * Connected to artifashion.cafe24api.com (211.42.136.107) port 443 (#0) * ALPN: offers h2 * ALPN: offers http/1.1 * (304) (OUT), TLS handshake, Client hello (1): * error:1404B438:SSL routines:ST_CONNECT:tlsv1 alert internal error * Closing connection 0 curl: (35) error:1404B438:SSL routines:ST_CONNECT:tlsv1 alert internal error
- 해당 에러는 cafe24 사이트에서 mall을 비활성화 혹은 삭제할 경우, 해당 Mall의 api 요청 도메인인https://{mall_id}.cafe24api.com가 비활성화되어, 요청시에 handshake가 불가능하여 4xx(요청 에러) 에러가 발생하는 경우이다.
How to solve
- E.S 에러 로그 확인하여, 해당 에러가 발생하는 쇼핑몰 수집 중단 처리
MismatchedInputException: Cannot deserialize value of type
What is Error?
- MismatchedInputException: Cannot deserialize value of typeCannot deserialize value of type java.util.ArrayList<com.luna.cpt.remindme.collector.domain.dto.cafe24.Cafe24Dto$OrdersCouponResponseInfo> from Object value (token JsonToken.START_OBJECT); nested exception is com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type java.util.ArrayList<com.luna.cpt.remindme.collector.domain.dto.cafe24.Cafe24Dto$OrdersCouponResponseInfo> from Object value (token JsonToken.START_OBJECT)
Why happened?
- 수집 서버에 선언한 response 타입(Cafe24 Api Document에 기재된 타입)과 실제 반환 타입이 달라 converting 할 때 MismatchedInputException가 발생한다.
- Example
- Expected result
- response body { "coupons":[ { {...}, {...}, {...}, {...} } ] }
- Actual Result
- response body { "coupons":{ "0":{...}, "5":{...}, "10":{...}, "15":{...} } }
- Expected result
How to solve
- 기대하는 응답 타입이 아닌 다른 타입으로 응답이 올 경우, 해당 응답값은 버린다.
java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
What is Error?
- java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $
Why happened?
- 수집 서버에 선언한 response 타입(Cafe24 Api Document에 기재된 타입)과 실제 반환 타입이 달라 converting 할 때 MismatchedInputException가 발생한다.
- Example
- Expected result
- response body { "coupons":[ { {...}, {...}, {...}, {...} } ] }
- Actual Result
- response body { "coupons":{ "0":{...}, "5":{...}, "10":{...}, "15":{...} } }
- Expected result
How to solve
- 기대하는 응답 타입이 아닌 다른 타입으로 응답이 올 경우, 해당 응답값은 버린다.
io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms
What is Error?
- io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000msio.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2115) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:406) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)
Why happened?
- 요청 URL에 문제가 있나 확인해보니, 응답 서버 혹은 URL에 대한 문제는 아니였고, 수집시에 API 요청량이 많다보니, 요청서버와 응답서버간(Cafe24) handshake에 있어서 지연이 생기는 것으로 확인됨.
How to solve
private HttpClient makeHttpClient() {
return HttpClient.create(makeConnectionProvider())
.secure(sslContextSpec -> {
try {
sslContextSpec.sslContext(SslContextBuilder
.forClient()
// SSL 인증서 확인을 비활성화 => 즉, 클라이언트는 유효하지 않거나 신뢰할 수 없는 인증서라도 모든 SSL 인증서를 수락
.trustManager(InsecureTrustManagerFactory.INSTANCE)
.build())
.handshakeTimeout(Duration.ofMillis(webClientProperties.getHTTPCLIENT_SECURE_SSL_HANDSHAKE_TIMEOUT_MILLIS()));
} catch (SSLException e) {
throw new RuntimeException(e);
}
})
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, webClientProperties.getCONNECTION_TIMEOUT_MILLISECOND())
.doOnConnected(connection ->
connection.addHandlerLast("READ_TIME_OUT_HANDLER", new ReadTimeoutHandler(webClientProperties.getREAD_TIMEOUT_SECOND()))
.addHandlerLast("WRITE_TIME_OUT_HANDLER", new WriteTimeoutHandler(webClientProperties.getWRITE_TIMEOUT_SECOND()))
);
}
- 대용량 API 요청시 충분한 handshake 대기시간을 위해서, HttpClient ssl의 handshakeTimeout를 Default 10000ms에서 30000ms로 증가시켰음.
org.springframework.dao.QueryTimeoutException: Redis command timed out
What is Error?
- org.springframework.dao.QueryTimeoutException: Redis command timed outorg.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 2 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.connection.lettuce.LettuceReactiveRedisConnection.lambda$translateException$0(LettuceReactiveRedisConnection.java:293) at reactor.core.publisher.Flux.lambda$onErrorMap$28(Flux.java:7070) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onError(MonoFlatMapMany.java:255) at reactor.core.publisher.Operators$MonoSubscriber.onError(Operators.java:1886) at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:134) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93) at io.lettuce.core.RedisPublisher$ImmediateSubscriber.onError(RedisPublisher.java:891) at io.lettuce.core.RedisPublisher$State.onError(RedisPublisher.java:712) at io.lettuce.core.RedisPublisher$RedisSubscription.onError(RedisPublisher.java:357) at io.lettuce.core.RedisPublisher$SubscriptionCommand.onError(RedisPublisher.java:797) at io.lettuce.core.RedisPublisher$SubscriptionCommand.doOnError(RedisPublisher.java:793) at io.lettuce.core.protocol.CommandWrapper.completeExceptionally(CommandWrapper.java:128) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:175) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Suppressed: java.lang.Exception: #block terminated with an error at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99) at reactor.core.publisher.Mono.block(Mono.java:1742) at com.luna.cpt.remindme.collector.akka.actors.ApiWorkActor.lambda$createReceive$1(ApiWorkActor.java:68) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) at scala.PartialFunction.applyOrElse(PartialFunction.scala:214) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:213) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:269) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:270) at akka.actor.Actor.aroundReceive(Actor.scala:537) at akka.actor.Actor.aroundReceive$(Actor.scala:535) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579) at akka.actor.ActorCell.invoke(ActorCell.scala:547) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) at akka.dispatch.Mailbox.run(Mailbox.scala:231) at akka.dispatch.Mailbox.exec(Mailbox.scala:243) at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 2 second(s) at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:176) ... 8 common frames omitted
Why happened
- Cause:
- Command timeouts are caused by the fact that a command was not completed within the configured timeout. Timeouts may be caused for various reasons:
- Redis server has crashed/network partition happened and your Redis service didn’t recover within the configured timeout
- Command was not finished in time. This can happen if your Redis server is overloaded or if the connection is blocked by a command (e.g. BLPOP 0, long-running Lua script). See also blpop(Duration.ZERO, …) gives RedisCommandTimeoutException.
- Configured timeout does not match Redis’s performance.
- If you block the EventLoop (e.g. calling blocking methods in a RedisFuture callback or in a Reactive pipeline). That can easily happen when calling Redis commands in a Pub/Sub listener or a RedisConnectionStateListener.
- Conclusion:
- 1번 상황이 우려 되어, 인프라팀에게 redis 모니터링 정보 요청함.
- QueryTimeoutException의 ****발생 시각은 16:31:19초경인데, 전혀 문제가 발생한 것 같지 않다.
- AWS 관리형 서비스에서는 거의 문제가 발생하지 않기 때문에 아닌 것 같았지만, 역시 아니다.
- 4번의 경우이다.
- QueryTimeoutException 에러는 수집 서버에서 heap이 초과되면서, 연이어서 발생했던 에러이다.
- 위 지표만 보아도 redis 서버는 정상적으로 작동하고 있었다.
- Reactive pipeline을 통해 레디스에 값을 요청하는 구조이다.
- heap space full 상태가 되면서 Reactive system이 제대로 동작하지 않아 해당 에러가 발생한것으로 예상된다.
- 현재 수집 서버는 Webflux를 사용하고 springframework.data.redis.core의 ReactiveValueOperations를 사용하고 있다.
nested exception is io.netty.handler.timeout.ReadTimeoutException
What is Error?
- nested exception is io.netty.handler.timeout.ReadTimeoutExceptionnested exception is io.netty.handler.timeout.ReadTimeoutException
Why happened?
- Here are a few scenarios where a read operation might take longer than the timeout value: (출처)
- The server takes too long to process the request and send a response.
- A network issue causes delays in the transmission of data.
- The client takes too long to process data received from the server.
- Conclusion:
- 특정 Cafe24 API URL의 경우, API 요청 후 응답 데이터가 너무 길어 제한된 읽기 시간(Read TimeOut) 안에 데이터 처리를 끝내지 못해 발생함.
How to solve
private HttpClient makeHttpClient() {
return HttpClient.create(makeConnectionProvider())
.secure(sslContextSpec -> {
try {
sslContextSpec.sslContext(SslContextBuilder
.forClient()
// SSL 인증서 확인을 비활성화 => 즉, 클라이언트는 유효하지 않거나 신뢰할 수 없는 인증서라도 모든 SSL 인증서를 수락
.trustManager(InsecureTrustManagerFactory.INSTANCE)
.build())
.handshakeTimeout(Duration.ofMillis(webClientProperties.getHTTPCLIENT_SECURE_SSL_HANDSHAKE_TIMEOUT_MILLIS()));
} catch (SSLException e) {
throw new RuntimeException(e);
}
})
.option(ChannelOption.CONNECT_TIMEOUT_MILLIS, webClientProperties.getCONNECTION_TIMEOUT_MILLISECOND())
.doOnConnected(connection ->
connection.addHandlerLast("READ_TIME_OUT_HANDLER", new ReadTimeoutHandler(webClientProperties.getREAD_TIMEOUT_SECOND()))
.addHandlerLast("WRITE_TIME_OUT_HANDLER", new WriteTimeoutHandler(webClientProperties.getWRITE_TIMEOUT_SECOND()))
);
}
- HttpClient가 연결되었을때 Read하는 시간을 증가시켰음.
- Before : 15 sec
- After : 60 sec