카테고리 없음

WebClient 사용하면서 발생했던 에러 분석 및 해결 방안

ChrisJang 2024. 1. 19. 15:34
Spring Webflux의 WebClient를 사용하면서 발견한 에러 모음 및 해당 에러에 따른 fallback 입니다.

Error List

io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: internal_error

 

What is Error?

  • io.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: internal_errorio.netty.handler.codec.DecoderException: javax.net.ssl.SSLException: Received fatal alert: internal_error at io.netty.handler.codec.ByteToMessageDecoder.callDecode(ByteToMessageDecoder.java:499) at io.netty.handler.codec.ByteToMessageDecoder.channelRead(ByteToMessageDecoder.java:290) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:444) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.AbstractChannelHandlerContext.fireChannelRead(AbstractChannelHandlerContext.java:412) at io.netty.channel.DefaultChannelPipeline$HeadContext.channelRead(DefaultChannelPipeline.java:1410) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:440) at io.netty.channel.AbstractChannelHandlerContext.invokeChannelRead(AbstractChannelHandlerContext.java:420) at io.netty.channel.DefaultChannelPipeline.fireChannelRead(DefaultChannelPipeline.java:919) at io.netty.channel.epoll.AbstractEpollStreamChannel$EpollStreamUnsafe.epollInReady(AbstractEpollStreamChannel.java:800) at io.netty.channel.epoll.EpollEventLoop.processReady(EpollEventLoop.java:499) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:397) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30)

Why happened?

  • curl -X GET --tlsv1.2 -Ikv \\
    'https://{mall_id}.cafe24api.com/api/v2/admin/customersprivacy?end_date=2024-01-15T07:15&date_type=login&limit=1000&fields=shop_no,member_id,name,cellphone,group_no,created_date,last_login_date,member_authentication,use_blacklist,blacklist_type,member_authority,gender,member_type,authentication_method,nationality,country_code&start_date=2024-01-15T06:15&shop_no=1'  \\
    -H 'Authorization: Bearer veUVPnB0ekJnq3365Mg2Pm8O' \\
    -H 'Content-Type: application/json' \\
    -H 'X-Cafe24-Api-Version: 2023-12-01'
    
    *   Trying 211.42.136.107:443...
    * Connected to artifashion.cafe24api.com (211.42.136.107) port 443 (#0)
    * ALPN: offers h2
    * ALPN: offers http/1.1
    * (304) (OUT), TLS handshake, Client hello (1):
    * error:1404B438:SSL routines:ST_CONNECT:tlsv1 alert internal error
    * Closing connection 0
    curl: (35) error:1404B438:SSL routines:ST_CONNECT:tlsv1 alert internal error
    
    • 해당 에러는 cafe24 사이트에서 mall을 비활성화 혹은 삭제할 경우, 해당 Mall의 api 요청 도메인인https://{mall_id}.cafe24api.com가 비활성화되어, 요청시에 handshake가 불가능하여 4xx(요청 에러) 에러가 발생하는 경우이다.

How to solve

  •  E.S 에러 로그 확인하여, 해당 에러가 발생하는 쇼핑몰 수집 중단 처리

MismatchedInputException: Cannot deserialize value of type

What is Error?

  • MismatchedInputException: Cannot deserialize value of typeCannot deserialize value of type java.util.ArrayList<com.luna.cpt.remindme.collector.domain.dto.cafe24.Cafe24Dto$OrdersCouponResponseInfo> from Object value (token JsonToken.START_OBJECT); nested exception is com.fasterxml.jackson.databind.exc.MismatchedInputException: Cannot deserialize value of type java.util.ArrayList<com.luna.cpt.remindme.collector.domain.dto.cafe24.Cafe24Dto$OrdersCouponResponseInfo> from Object value (token JsonToken.START_OBJECT)

Why happened?

  • 수집 서버에 선언한 response 타입(Cafe24 Api Document에 기재된 타입)과 실제 반환 타입이 달라 converting 할 때 MismatchedInputException가 발생한다.
  • Example
    • Expected result
      • response body { "coupons":[ { {...}, {...}, {...}, {...} } ] }
    • Actual Result
      • response body { "coupons":{ "0":{...}, "5":{...}, "10":{...}, "15":{...} } }

How to solve

  • 기대하는 응답 타입이 아닌 다른 타입으로 응답이 올 경우, 해당 응답값은 버린다.

java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $

What is Error?

  • java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $java.lang.IllegalStateException: Expected BEGIN_OBJECT but was STRING at line 1 column 1 path $

Why happened? 

  • 수집 서버에 선언한 response 타입(Cafe24 Api Document에 기재된 타입)과 실제 반환 타입이 달라 converting 할 때 MismatchedInputException가 발생한다.
  • Example
    • Expected result
      • response body { "coupons":[ { {...}, {...}, {...}, {...} } ] }
    • Actual Result
      • response body { "coupons":{ "0":{...}, "5":{...}, "10":{...}, "15":{...} } }

How to solve

  • 기대하는 응답 타입이 아닌 다른 타입으로 응답이 올 경우, 해당 응답값은 버린다.
io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms

What is Error?

  • io.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000msio.netty.handler.ssl.SslHandshakeTimeoutException: handshake timed out after 10000ms at io.netty.handler.ssl.SslHandler$7.run(SslHandler.java:2115) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.AbstractEventExecutor.safeExecute(AbstractEventExecutor.java:167) at io.netty.util.concurrent.SingleThreadEventExecutor.runAllTasks(SingleThreadEventExecutor.java:470) at io.netty.channel.epoll.EpollEventLoop.run(EpollEventLoop.java:406) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source)

Why happened?

  • 요청 URL에 문제가 있나 확인해보니, 응답 서버 혹은 URL에 대한 문제는 아니였고, 수집시에 API 요청량이 많다보니, 요청서버와 응답서버간(Cafe24) handshake에 있어서 지연이 생기는 것으로 확인됨. 

How to solve

private HttpClient makeHttpClient() {
        return HttpClient.create(makeConnectionProvider())
                .secure(sslContextSpec -> {
                    try {
                        sslContextSpec.sslContext(SslContextBuilder
                                .forClient()
                                // SSL 인증서 확인을 비활성화 => 즉, 클라이언트는 유효하지 않거나 신뢰할 수 없는 인증서라도 모든 SSL 인증서를 수락
                                .trustManager(InsecureTrustManagerFactory.INSTANCE)
                                .build())
                                .handshakeTimeout(Duration.ofMillis(webClientProperties.getHTTPCLIENT_SECURE_SSL_HANDSHAKE_TIMEOUT_MILLIS()));
                    } catch (SSLException e) {
                        throw new RuntimeException(e);
                    }
                })
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, webClientProperties.getCONNECTION_TIMEOUT_MILLISECOND())
                .doOnConnected(connection ->
                        connection.addHandlerLast("READ_TIME_OUT_HANDLER", new ReadTimeoutHandler(webClientProperties.getREAD_TIMEOUT_SECOND()))
                                .addHandlerLast("WRITE_TIME_OUT_HANDLER", new WriteTimeoutHandler(webClientProperties.getWRITE_TIMEOUT_SECOND()))
                );
    }
  • 대용량 API 요청시 충분한 handshake 대기시간을 위해서, HttpClient ssl의 handshakeTimeout를 Default 10000ms에서 30000ms로 증가시켰음.


org.springframework.dao.QueryTimeoutException: Redis command timed out

What is Error?

  • org.springframework.dao.QueryTimeoutException: Redis command timed outorg.springframework.dao.QueryTimeoutException: Redis command timed out; nested exception is io.lettuce.core.RedisCommandTimeoutException: Command timed out after 2 second(s) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:70) at org.springframework.data.redis.connection.lettuce.LettuceExceptionConverter.convert(LettuceExceptionConverter.java:41) at org.springframework.data.redis.connection.lettuce.LettuceReactiveRedisConnection.lambda$translateException$0(LettuceReactiveRedisConnection.java:293) at reactor.core.publisher.Flux.lambda$onErrorMap$28(Flux.java:7070) at reactor.core.publisher.FluxOnErrorResume$ResumeSubscriber.onError(FluxOnErrorResume.java:94) at reactor.core.publisher.MonoFlatMapMany$FlatMapManyInner.onError(MonoFlatMapMany.java:255) at reactor.core.publisher.Operators$MonoSubscriber.onError(Operators.java:1886) at reactor.core.publisher.FluxMap$MapSubscriber.onError(FluxMap.java:134) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93) at reactor.core.publisher.MonoNext$NextSubscriber.onError(MonoNext.java:93) at io.lettuce.core.RedisPublisher$ImmediateSubscriber.onError(RedisPublisher.java:891) at io.lettuce.core.RedisPublisher$State.onError(RedisPublisher.java:712) at io.lettuce.core.RedisPublisher$RedisSubscription.onError(RedisPublisher.java:357) at io.lettuce.core.RedisPublisher$SubscriptionCommand.onError(RedisPublisher.java:797) at io.lettuce.core.RedisPublisher$SubscriptionCommand.doOnError(RedisPublisher.java:793) at io.lettuce.core.protocol.CommandWrapper.completeExceptionally(CommandWrapper.java:128) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:175) at io.netty.util.concurrent.PromiseTask.runTask(PromiseTask.java:98) at io.netty.util.concurrent.ScheduledFutureTask.run(ScheduledFutureTask.java:153) at io.netty.util.concurrent.AbstractEventExecutor.runTask(AbstractEventExecutor.java:174) at io.netty.util.concurrent.DefaultEventExecutor.run(DefaultEventExecutor.java:66) at io.netty.util.concurrent.SingleThreadEventExecutor$4.run(SingleThreadEventExecutor.java:997) at io.netty.util.internal.ThreadExecutorMap$2.run(ThreadExecutorMap.java:74) at io.netty.util.concurrent.FastThreadLocalRunnable.run(FastThreadLocalRunnable.java:30) at java.base/java.lang.Thread.run(Unknown Source) Suppressed: java.lang.Exception: #block terminated with an error at reactor.core.publisher.BlockingSingleSubscriber.blockingGet(BlockingSingleSubscriber.java:99) at reactor.core.publisher.Mono.block(Mono.java:1742) at com.luna.cpt.remindme.collector.akka.actors.ApiWorkActor.lambda$createReceive$1(ApiWorkActor.java:68) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:24) at akka.japi.pf.UnitCaseStatement.apply(CaseStatements.scala:20) at scala.PartialFunction.applyOrElse(PartialFunction.scala:214) at scala.PartialFunction.applyOrElse$(PartialFunction.scala:213) at akka.japi.pf.UnitCaseStatement.applyOrElse(CaseStatements.scala:20) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:269) at scala.PartialFunction$OrElse.applyOrElse(PartialFunction.scala:270) at akka.actor.Actor.aroundReceive(Actor.scala:537) at akka.actor.Actor.aroundReceive$(Actor.scala:535) at akka.actor.AbstractActor.aroundReceive(AbstractActor.scala:220) at akka.actor.ActorCell.receiveMessage(ActorCell.scala:579) at akka.actor.ActorCell.invoke(ActorCell.scala:547) at akka.dispatch.Mailbox.processMailbox(Mailbox.scala:270) at akka.dispatch.Mailbox.run(Mailbox.scala:231) at akka.dispatch.Mailbox.exec(Mailbox.scala:243) at java.base/java.util.concurrent.ForkJoinTask.doExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool$WorkQueue.topLevelExec(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.scan(Unknown Source) at java.base/java.util.concurrent.ForkJoinPool.runWorker(Unknown Source) at java.base/java.util.concurrent.ForkJoinWorkerThread.run(Unknown Source) Caused by: io.lettuce.core.RedisCommandTimeoutException: Command timed out after 2 second(s) at io.lettuce.core.internal.ExceptionFactory.createTimeoutException(ExceptionFactory.java:59) at io.lettuce.core.protocol.CommandExpiryWriter.lambda$potentiallyExpire$0(CommandExpiryWriter.java:176) ... 8 common frames omitted

Why happened

  • Cause:
    • Command timeouts are caused by the fact that a command was not completed within the configured timeout. Timeouts may be caused for various reasons:
      1. Redis server has crashed/network partition happened and your Redis service didn’t recover within the configured timeout
      2. Command was not finished in time. This can happen if your Redis server is overloaded or if the connection is blocked by a command (e.g. BLPOP 0, long-running Lua script). See also blpop(Duration.ZERO, …) gives RedisCommandTimeoutException.
      3. Configured timeout does not match Redis’s performance.
      4. If you block the EventLoop (e.g. calling blocking methods in a RedisFuture callback or in a Reactive pipeline). That can easily happen when calling Redis commands in a Pub/Sub listener or a RedisConnectionStateListener.
  • Conclusion:
    • 1번 상황이 우려 되어, 인프라팀에게 redis 모니터링 정보 요청함. 

  • QueryTimeoutException의 ****발생 시각은 16:31:19초경인데, 전혀 문제가 발생한 것 같지 않다.
    • AWS 관리형 서비스에서는 거의 문제가 발생하지 않기 때문에 아닌 것 같았지만, 역시 아니다.
  • 4번의 경우이다.
    • QueryTimeoutException 에러는 수집 서버에서 heap이 초과되면서, 연이어서 발생했던 에러이다.
    • 위 지표만 보아도 redis 서버는 정상적으로 작동하고 있었다.
    • Reactive pipeline을 통해 레디스에 값을 요청하는 구조이다.
      • heap space full 상태가 되면서 Reactive system이 제대로 동작하지 않아 해당 에러가 발생한것으로 예상된다.
  • 현재 수집 서버는 Webflux를 사용하고 springframework.data.redis.core의 ReactiveValueOperations를 사용하고 있다.

nested exception is io.netty.handler.timeout.ReadTimeoutException

What is Error?

  • nested exception is io.netty.handler.timeout.ReadTimeoutExceptionnested exception is io.netty.handler.timeout.ReadTimeoutException

Why happened?

  • Here are a few scenarios where a read operation might take longer than the timeout value: (출처)
    1. The server takes too long to process the request and send a response.
    2. A network issue causes delays in the transmission of data.
    3. The client takes too long to process data received from the server.
    • Conclusion:
      • 특정 Cafe24 API URL의 경우, API 요청 후 응답 데이터가 너무 길어 제한된 읽기 시간(Read TimeOut) 안에 데이터 처리를 끝내지 못해 발생함.

How to solve

private HttpClient makeHttpClient() {
        return HttpClient.create(makeConnectionProvider())
                .secure(sslContextSpec -> {
                    try {
                        sslContextSpec.sslContext(SslContextBuilder
                                .forClient()
                                // SSL 인증서 확인을 비활성화 => 즉, 클라이언트는 유효하지 않거나 신뢰할 수 없는 인증서라도 모든 SSL 인증서를 수락
                                .trustManager(InsecureTrustManagerFactory.INSTANCE)
                                .build())
                                .handshakeTimeout(Duration.ofMillis(webClientProperties.getHTTPCLIENT_SECURE_SSL_HANDSHAKE_TIMEOUT_MILLIS()));
                    } catch (SSLException e) {
                        throw new RuntimeException(e);
                    }
                })
                .option(ChannelOption.CONNECT_TIMEOUT_MILLIS, webClientProperties.getCONNECTION_TIMEOUT_MILLISECOND())
                .doOnConnected(connection ->
                        connection.addHandlerLast("READ_TIME_OUT_HANDLER", new ReadTimeoutHandler(webClientProperties.getREAD_TIMEOUT_SECOND()))
                                .addHandlerLast("WRITE_TIME_OUT_HANDLER", new WriteTimeoutHandler(webClientProperties.getWRITE_TIMEOUT_SECOND()))
                );
    }
  • HttpClient가 연결되었을때 Read하는 시간을 증가시켰음.
    • Before : 15 sec
    • After : 60 sec