• Skip to primary navigation
  • Skip to content
  • Skip to primary sidebar

陈文管的博客

分享有价值的内容

  • Android
  • Affiliate
  • SEO
  • 前后端
  • 网站建设
  • 自动化
  • 开发资源
  • 关于

Android ANR详解

2019年9月24日 | 最近更新于 上午12:51

ANR(Application Not responding),是指应用程序未响应,Android系统对于一些事件需要在一定的时间范围内完成,如果超过预定时间能未能得到有效响应或者响应时间过长,都会造成ANR。本文内容包括ANR的类型说明,ANR的原理解析,ANR四种检测方案介绍和常见ANR问题的分析解决方法。

文章目录

  • 一、ANR的类型
  • 二、ANR的原理
    • 1. Service Timeout ANR触发机制
    • 2. InputDispatching Timeout ANR触发机制
  • 三、ANR检测方案
    • 1. BlockCanary
    • 2. ANR-WatchDog
    • 3.  SafeLooper
    • 4. FileObserver
  • 四、ANR问题分析解决
    • 1. 主线程耗时操作
    • 2. CPU过高
    • 3. 卡在IO读写
    • 4. 死锁或锁等待
    • 5. 主线程Binder调用等待超时
    • 6. Binder线程池被占满
    • 7. JE或者NE导致ANR
  • 五、参考资料

一、ANR的类型

1. InputDispatching Timeout

超时时间:谷歌平台默认5s,MTK平台8s

原因:对输入事件(例如按键或屏幕轻触事件)没有响应

2. Broadcast Timeout

超时时间:前台广播10s,后台广播60s

原因:在特定时间内无法处理完成

3. Service Timeout

超时时间:前台20s,后台60s

原因:小概率类型,Service在特定的时间内无法处理完成

4. ContentProvider Timeout

超时时间:10s

原因:Provider发布(启动)超过10s

二、ANR的原理

Android ANR 原理流程图

InputEvent的ANR与上图有些许不同,是在Native监控,但同样会堵塞主线程的消息队列。

触发ANR的过程可分为两个步骤:

  1. 埋炸弹
  2. 拆炸弹或引爆炸弹

Broadcast、ContentProvider和Service三者ANR超时机制类似,下面看下Service的ANR触发原理,Service Timeout是位于ActivityManager线程中的ActivityManagerService.MainHandler收到SERVICE_TIMEOUT_MSG消息时触发。

1. Service Timeout ANR触发机制

1)埋炸弹

Service进程attach到system_server进程的过程中会调用ActiveServices.java中的realStartServiceLocked()方法来埋下炸弹。

/**
 * 以下精简代码基于Android 6.0
 */
private final void realStartServiceLocked(ServiceRecord r,
                                          ProcessRecord app, boolean execInFg) throws RemoteException {
    //发送delay消息(SERVICE_TIMEOUT_MSG),即埋炸弹
    bumpServiceExecutingLocked(r, execInFg, "create");
    try {
        //最终执行服务的onCreate()方法
        app.thread.scheduleCreateService(r, r.serviceInfo,
                mAm.compatibilityInfoForPackageLocked(r.serviceInfo.applicationInfo),
                app.repProcState);
    } catch (DeadObjectException e) {
    } finally {
    }
}
private final void bumpServiceExecutingLocked(ServiceRecord r, boolean fg, String why) {
    if (r.executeNesting == 0) {
        if (r.app != null) {
            r.app.executingServices.add(r);
            r.app.execServicesFg |= fg;
            if (r.app.executingServices.size() == 1) {
                scheduleServiceTimeoutLocked(r.app);
            }
        }
    } else if (r.app != null && fg && !r.app.execServicesFg) {
        r.app.execServicesFg = true;
        scheduleServiceTimeoutLocked(r.app);
    }
}
void scheduleServiceTimeoutLocked(ProcessRecord proc) {
    if (proc.executingServices.size() == 0 || proc.thread == null) {
        return;
    }
    long now = SystemClock.uptimeMillis();
    Message msg = mAm.mHandler.obtainMessage(
            ActivityManagerService.SERVICE_TIMEOUT_MSG);
    msg.obj = proc;
    //当超时后仍没有remove该SERVICE_TIMEOUT_MSG消息,则执行service Timeout流程,即引爆炸弹
    mAm.mHandler.sendMessageAtTime(msg,
            proc.execServicesFg ? (now+SERVICE_TIMEOUT) : (now+ SERVICE_BACKGROUND_TIMEOUT));
}

2)拆炸弹

在ActivityThread的handleCreateService方法中,即服务创建完成之后拆除炸弹。

private void handleCreateService(ActivityThread.CreateServiceData data) {
    try {
        ContextImpl context = ContextImpl.createAppContext(this, packageInfo);
        context.setOuterContext(service);
        Application app = packageInfo.makeApplication(false, mInstrumentation);
        service.attach(context, this, data.info.name, data.token, app,
                ActivityManagerNative.getDefault());
        service.onCreate();
        mServices.put(data.token, service);
        try {
          //移除超时消息,即拆炸弹,最终调用到ActiveServices中的serviceDoneExecutingLocked方法
            ActivityManagerNative.getDefault().serviceDoneExecuting(
                    data.token, SERVICE_DONE_EXECUTING_ANON, 0, 0);
        } catch (RemoteException e) {
            // nothing to do.
        }
    } catch (Exception e) {
    }
}
/**
 * ActiveService中的serviceDoneExecutingLocked方法
 */
private void serviceDoneExecutingLocked(ServiceRecord r, boolean inDestroying,
                                        boolean finishing) {
    if (r.executeNesting <= 0) {
        if (r.app != null) {
            r.app.execServicesFg = false;
            r.app.executingServices.remove(r);
            if (r.app.executingServices.size() == 0) {
                //当前服务所在进程中没有正在执行的service,移除超时消息,即拆炸弹
                mAm.mHandler.removeMessages(ActivityManagerService.SERVICE_TIMEOUT_MSG, r.app);
            }
        }
    }
}

3)ANR超时炸弹引爆过程

在system_server进程中有一个ActivityManager的Handler线程,当倒计时结束便会向该Handler线程发送一条SERVICE_TIMEOUT_MSG信息,即上面ActiveService类中scheduleServiceTimeoutLocked里面的逻辑调用。

final class MainHandler extends Handler {
    public void handleMessage(Message msg) {
        switch (msg.what) {
            case SERVICE_TIMEOUT_MSG: {
                mServices.serviceTimeout((ProcessRecord)msg.obj);
            } break;
        }
    }
}

上面的mServices调用的serviceTimeout方法就是调用ActiveServices的serviceTimeout方法。

void serviceTimeout(ProcessRecord proc) {
    String anrMessage = null;
    synchronized(mAm) {
        if (proc.executingServices.size() == 0 || proc.thread == null) {
            return;
        }
        final long now = SystemClock.uptimeMillis();
        final long maxTime =  now -
                (proc.execServicesFg ? SERVICE_TIMEOUT : SERVICE_BACKGROUND_TIMEOUT);
        ServiceRecord timeout = null;
        long nextTime = 0;
        for (int i=proc.executingServices.size()-1; i>=0; i--) {
            ServiceRecord sr = proc.executingServices.valueAt(i);
            if (sr.executingStart < maxTime) {
                timeout = sr;
                break;
            }
            if (sr.executingStart > nextTime) {
                nextTime = sr.executingStart;
            }
        }
        if (timeout != null && mAm.mLruProcesses.contains(proc)) {
            Slog.w(TAG, "Timeout executing service: " + timeout);
            StringWriter sw = new StringWriter();
            PrintWriter pw = new FastPrintWriter(sw, false, 1024);
            pw.println(timeout);
            timeout.dump(pw, "    ");
            pw.close();
            mLastAnrDump = sw.toString();
            mAm.mHandler.removeCallbacks(mLastAnrDumpClearer);
            mAm.mHandler.postDelayed(mLastAnrDumpClearer, LAST_ANR_LIFETIME_DURATION_MSECS);
            anrMessage = "executing service " + timeout.shortName;
        } else {
            Message msg = mAm.mHandler.obtainMessage(
                    ActivityManagerService.SERVICE_TIMEOUT_MSG);
            msg.obj = proc;
            mAm.mHandler.sendMessageAtTime(msg, proc.execServicesFg
                    ? (nextTime+SERVICE_TIMEOUT) : (nextTime + SERVICE_BACKGROUND_TIMEOUT));
        }
    }
    if (anrMessage != null) {
        //当存在timeout的service,则执行appNotResponding
        mAm.appNotResponding(proc, null, null, false, anrMessage);
    }
}

Broadcast和ContentProvider的ANR机制此处不再赘述。详细可参考:理解Android ANR的触发原理

2. InputDispatching Timeout ANR触发机制

1)埋炸弹

InputReader.cpp读取事件,通过InputDispatcher.cpp分发事件,以下是主要精简代码逻辑,详细见InputDispatcher.cpp。

void InputDispatcher::dispatchOnceInnerLocked(nsecs_t* nextWakeupTime) {
    nsecs_t currentTime = now();
    // Ready to start a new event.
    // If we don't already have a pending event, go grab one.
    if (! mPendingEvent) {
        // Get ready to dispatch the event.
        // 埋炸弹
        resetANRTimeoutsLocked();
    }
    switch (mPendingEvent->type) {
        case EventEntry::TYPE_KEY: {
            done = dispatchKeyLocked(currentTime, typedEntry, &dropReason, nextWakeupTime);
            break;
        }
        case EventEntry::TYPE_MOTION: {
            done = dispatchMotionLocked(currentTime, typedEntry,
                    &dropReason, nextWakeupTime);
            break;
        }
    }
    // 事件分发处理完成,拆炸弹
    if (done) {
        if (dropReason != DROP_REASON_NOT_DROPPED) {
            dropInboundEventLocked(mPendingEvent, dropReason);
        }
        mLastDropReason = dropReason;
        releasePendingEventLocked();
        *nextWakeupTime = LONG_LONG_MIN;  // force next poll to wake up immediately
    }
}

2)dispatchMotionLocked分支

先看下dispatchKeyLocked分发的分支逻辑处理,dispatchKeyLocked之后调用到findFocusedWindowTargetsLocked函数,下面是这个函数的逻辑代码处理:

  • 如果当前没有聚焦的窗口,但是有聚焦的应用,则等待应用启动完成,或者启动超时发生ANR。
  • 如果窗口处于Pause、连接未注册或连接挂掉等状态则持续等待,直到启动完成或等待超时发生ANR。
int32_t InputDispatcher::findFocusedWindowTargetsLocked(nsecs_t currentTime,
    const EventEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime) {
    int32_t injectionResult;
    String8 reason;
    // If there is no currently focused window and no focused application
    // then drop the event.
    if (mFocusedWindowHandle == NULL) {
        if (mFocusedApplicationHandle != NULL) {
            injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
                    mFocusedApplicationHandle, NULL, nextWakeupTime,
                    "Waiting because no window has focus but there is a "
                    "focused application that may eventually add a window "
                    "when it finishes starting up.");
            goto Unresponsive;
        }
        ALOGI("Dropping event because there is no focused window or focused application.");
        injectionResult = INPUT_EVENT_INJECTION_FAILED;
        goto Failed;
    }
    // Check permissions.
    if (! checkInjectionPermission(mFocusedWindowHandle, entry->injectionState)) {
        injectionResult = INPUT_EVENT_INJECTION_PERMISSION_DENIED;
        goto Failed;
    }
    // Check whether the window is ready for more input.
    reason = checkWindowReadyForMoreInputLocked(currentTime,
            mFocusedWindowHandle, entry, "focused");
    if (!reason.isEmpty()) {
        injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
                mFocusedApplicationHandle, mFocusedWindowHandle, nextWakeupTime, reason.string());
        goto Unresponsive;
    }
    // Success!  Output targets.
    injectionResult = INPUT_EVENT_INJECTION_SUCCEEDED;
    addWindowTargetLocked(mFocusedWindowHandle,
            InputTarget::FLAG_FOREGROUND | InputTarget::FLAG_DISPATCH_AS_IS, BitSet32(0),
            inputTargets);
    // Done.
Failed:
Unresponsive:
    nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime);
    updateDispatchStatisticsLocked(currentTime, entry,
            injectionResult, timeSpentWaitingForApplication);
#if DEBUG_FOCUS
    ALOGD("findFocusedWindow finished: injectionResult=%d, "
            "timeSpentWaitingForApplication=%0.1fms",
            injectionResult, timeSpentWaitingForApplication / 1000000.0);
#endif
    return injectionResult;
}

3)dispatchMotionLocked分支

以下是精简的函数代码,详细见InputDispatcher.cpp,走到Unresponsive只有一个地方,检查窗口是否已经加载完毕,没有则等待,等待超时就触发ANR。

int32_t InputDispatcher::findTouchedWindowTargetsLocked(nsecs_t currentTime,
    const MotionEntry* entry, Vector<InputTarget>& inputTargets, nsecs_t* nextWakeupTime,
                                                        bool* outConflictingPointerActions) {
    // Ensure all touched foreground windows are ready for new input.
    for (size_t i = 0; i < mTempTouchState.windows.size(); i++) {
    const TouchedWindow& touchedWindow = mTempTouchState.windows[i];
        if (touchedWindow.targetFlags & InputTarget::FLAG_FOREGROUND) {
            // Check whether the window is ready for more input.
            String8 reason = checkWindowReadyForMoreInputLocked(currentTime,
                    touchedWindow.windowHandle, entry, "touched");
            if (!reason.isEmpty()) {
                injectionResult = handleTargetsNotReadyLocked(currentTime, entry,
                        NULL, touchedWindow.windowHandle, nextWakeupTime, reason.string());
                goto Unresponsive;
            }
        }
    }
    Unresponsive:
    // Reset temporary touch state to ensure we release unnecessary references to input channels.
    mTempTouchState.reset();
    nsecs_t timeSpentWaitingForApplication = getTimeSpentWaitingForApplicationLocked(currentTime);
    updateDispatchStatisticsLocked(currentTime, entry,
            injectionResult, timeSpentWaitingForApplication);
    return injectionResult;
}

三、ANR检测方案

Android应用程序是通过消息来驱动的,Android某种意义上也可以说成是一个以消息驱动的系统,UI、事件、生命周期都和消息处理机制息息相关。Android的ANR监测方案也是一样,大部分就是利用了Android的消息机制。

目前流行的ANR检测方案有开源的BlockCanary 、ANR-WatchDog、SafeLooper,还有根据谷歌原生系统接口监测的方案FileObserver,下面就针对这四种方案根据场景解析对比。

1. BlockCanary

BlockCanary是国内开发者markzhai开发的一款非侵入式的轻量性能监控组件,目前已经把BlockCanary集成在AndroidPerformanceMonitor工程中。

实现原理是巧妙的利用了Android原生Looper.loop中的一个log打印逻辑,在loop函数中分发消息的前后都有调用logging.println()打印日志信息,它在每个message处理的前后被调用,如果主线程卡住了,就是在dispatchMessage里卡住了。

public static void loop() {
    final Looper me = myLooper();
    if (me == null) {
        throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
    }
    final MessageQueue queue = me.mQueue;
    for (;;) {
        Message msg = queue.next(); // might block
        if (msg == null) {
            // No message indicates that the message queue is quitting.
            return;
        }
        // This must be in a local variable, in case a UI event sets the logger
        final Printer logging = me.mLogging;
        if (logging != null) {
            logging.println(">>>>> Dispatching to " + msg.target + " " +
                    msg.callback + ": " + msg.what);
        }
        try {
            msg.target.dispatchMessage(msg);
        } catch (Exception exception) {
        } finally {
        }
        if (logging != null) {
            logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
        }
    }
}

可以直接看BlockCanary.java类中的实现,设置了logging对象之后就可以知道每次消息分发的日志输出操作。

/**
 * Start monitoring.
 */
public void start() {
    if (!mMonitorStarted) {
        mMonitorStarted = true;
        Looper.getMainLooper().setMessageLogging(mBlockCanaryCore.monitor);
    }
}
/**
 * Stop monitoring.
 */
public void stop() {
    if (mMonitorStarted) {
        mMonitorStarted = false;
        Looper.getMainLooper().setMessageLogging(null);
        mBlockCanaryCore.stackSampler.stop();
        mBlockCanaryCore.cpuSampler.stop();
    }
}

之后就是根据日志打印的时间间隔来判断主线程是否阻塞了,详细见LooperMonitor.java。

@Override
public void println(String x) {
    if (mStopWhenDebugging && Debug.isDebuggerConnected()) {
        return;
    }
    if (!mPrintingStarted) {
        mStartTimestamp = System.currentTimeMillis();
        mStartThreadTimestamp = SystemClock.currentThreadTimeMillis();
        mPrintingStarted = true;
        startDump();
    } else {
        final long endTime = System.currentTimeMillis();
        mPrintingStarted = false;
        if (isBlock(endTime)) {
            notifyBlockEvent(endTime);
        }
        stopDump();
    }
}
private boolean isBlock(long endTime) {
    return endTime - mStartTimestamp > mBlockThresholdMillis;
}

优点:

  • 灵活配置可监控常见APP应用性能也可作为一部分场景的ANR监测,并且可以准确定位ANR和耗时调用栈。

缺点:

  • 谷歌已经明确标注This must be in a local variable, in case a UI event sets the logger。这个looger对象是可以被更改的,已有开发者遇到在使用WebView时,logger被设置为Null导致BlockCanary失效,只能让BlockCanary在WebView初始化之后调用start。
  • dispatchMessage执行非常久时无法触发BlockCanary的逻辑。
  • 谷歌在Looper中还有一个标注,这里的queue.next可能block,场景就是前面提到的InputEvent,此处block同样会触发ANR,但BlockCanary同样无法适用。

loop函数主要逻辑代码如下:

public static void loop() {
    final Looper me = myLooper();
    if (me == null) {
        throw new RuntimeException("No Looper; Looper.prepare() wasn't called on this thread.");
    }
    final MessageQueue queue = me.mQueue;
    for (;;) {
        Message msg = queue.next(); // might block
        if (msg == null) {
            // No message indicates that the message queue is quitting.
            return;
        }
        // This must be in a local variable, in case a UI event sets the logger
        final Printer logging = me.mLogging;
        if (logging != null) {
            logging.println(">>>>> Dispatching to " + msg.target + " " +
                    msg.callback + ": " + msg.what);
        }
        try {
            msg.target.dispatchMessage(msg);
        } catch (Exception exception) {
        } finally {
        }
        if (logging != null) {
            logging.println("<<<<< Finished to " + msg.target + " " + msg.callback);
        }
    }
}

更多资料可参考:BlockCanary — 轻松找出Android App界面卡顿元凶

2. ANR-WatchDog

ANR-WatchDog是参考Android WatchDog机制,起个单独线程向主线程发送一个变量+1操作,自我休眠自定义ANR的阈值,休眠过后判断变量是否+1完成,如果未完成则告警。

ANR-WatchDog流程图

对应的主要逻辑代码如下,详细见ANRWatchDog.java。

while (!isInterrupted()) {
    boolean needPost = _tick == 0;
    _tick += interval;
    if (needPost) {
        _uiHandler.post(_ticker);
    }
    try {
        Thread.sleep(interval);
    } catch (InterruptedException e) {
        _interruptionListener.onInterrupted(e);
        return ;
    }
    // If the main thread has not handled _ticker, it is blocked. ANR.
    if (_tick != 0 && !_reported) {
        //noinspection ConstantConditions
        if (!_ignoreDebugger && (Debug.isDebuggerConnected() || Debug.waitingForDebugger())) {
            Log.w("ANRWatchdog", "An ANR was detected but ignored because the debugger is connected (you can prevent this with setIgnoreDebugger(true))");
            _reported = true;
            continue ;
        }
        interval = _anrInterceptor.intercept(_tick);
        if (interval > 0) {
            continue;
        }
        final ANRError error;
        if (_namePrefix != null) {
            error = ANRError.New(_tick, _namePrefix, _logThreadsWithoutStackTrace);
        } else {
            error = ANRError.NewMainOnly(_tick);
        }
        _anrListener.onAppNotResponding(error);
        interval = _timeoutInterval;
        _reported = true;
    }
}

优点:

  • 兼容性好,各个机型版本通用
  • 无需修改APP逻辑代码,非侵入式
  • 逻辑简单,性能影响不大

缺点:

  • 无法保证能捕捉所有ANR,对阈值的设置直接影响捕获概率。

如果线程的堵塞大于10s,设置监控阈值5s能捕获所有ANR。堵塞时间在5s~10s,可能出现无法捕获场景。

3.  SafeLooper

SafeLooper是个比较新奇的思路,本身就是一个堵塞的消息,在自己内部进行消息的处理,通过反射接管主线程Looper的功能。

SafeLooper流程图

主要处理逻辑代码如下,详细见SafeLooper.java。

Method next;
Field target;
try {
    Method m = MessageQueue.class.getDeclaredMethod("next");
    m.setAccessible(true);
    next = m;
    Field f = Message.class.getDeclaredField("target");
    f.setAccessible(true);
    target = f;
} catch (Exception e) {
    return;
}
RUNNINGS.set(this);
MessageQueue queue = Looper.myQueue();
Binder.clearCallingIdentity();
final long ident = Binder.clearCallingIdentity();
while (true) {
    try {
        Message msg = (Message) next.invoke(queue);
        if (msg == null || msg.obj == EXIT){
            break;    
        }
        Handler h = (Handler) target.get(msg);
        h.dispatchMessage(msg);
        final long newIdent = Binder.clearCallingIdentity();
        if (newIdent != ident) {
        }
        msg.recycle();
    } catch (Exception e) {
        Thread.UncaughtExceptionHandler h = uncaughtExceptionHandler;
        Throwable ex = e;
        if (e instanceof InvocationTargetException) {
            ex = ((InvocationTargetException) e).getCause();
            if (ex == null) {
                ex = e;
            }
        }
        // e.printStackTrace(System.err);
        if (h != null) {
            h.uncaughtException(Thread.currentThread(), ex);
        }
        new Handler().post(this);
        break;
    }
}
RUNNINGS.set(null);

此方案使用反射进行message管理会有很大的性能损耗,但可以自由定制,这种AOP的思想可以借鉴。

4. FileObserver

有ANR的流程就可以知道/data/anr文件夹的变化代表着ANR的发生,AMS在dumpStackTrace方法中给了我们一些提示。

按照这个思路,当ANR发生的时候,可以通过监听ANR Trace文件的写入情况来判断是否发生了ANR,需要注意的是,所有应用发生ANR的时候都会进行回调,需要做一些过滤与判断,如包名、进程号等。

优点:

  • 基于原生接口调用,时机和内容准确
  • 无性能问题实现简单

缺点:

  • 最大的困难是兼容性问题,这个方案受限于Android系统的SELinux机制,5.0以后基本已经使低权限应用无法监听到Trace文件,但是可以在开发内测阶段通过root手机修改app对应的te文件提权进行监控。

目前能了解到的方案并不太多,在Goolge Play上有2.68%实用率的ACRA库也只是推荐了WatchDog方式。建议FileObserver和WatchDog组合使用,能覆盖绝大部分的机型和ANR异常。

四、ANR问题分析解决

对于ANR问题,先根据ANR系统日志信息确认准确的时间点,接着看日志信息里面CPU和IO是否偏高,之后根据ANR时间点找对应的Trace文件,分析ANR堆栈信息,通过应用包名过滤ANR应用相关的调用逻辑。

ANR常见类型如下:

1. 主线程耗时操作

比如网络访问、访问数据库、文件读写、频繁的大量的计算赋值逻辑,这些都是常见的ANR原因,具体根据ANR的Trace文件调用堆栈信息可以直接看出来,这边不再赘述。

以上的ANR一般使用异步的方式解决,当然不是简单的new一个线程,最好根据业务场景以及频率来决定,Android常用的异步操作有AsyncTask,IntentService,线程池(官方四种或自定义),最好用一个线程池管理操作线程,不建议每次直接new一个线程。相关阅读见Android 子线程更新UI详解。

2. CPU过高

比如下面这份ANR日志信息,CPU数值异常偏高到186%,导致CPU过高的原因可能是频繁地IO读写,频繁调用耗时JNI接口。

ActivityManager: ANR in com.autonavi.amapauto (com.autonavi.amapauto/.MainMapActivity)
ActivityManager: PID: 7321
ActivityManager: Reason: Input dispatching timed out (Waiting to send non-key event because the touched window has not finished processing certain input events that were delivered to it over 500.0ms ago. Wait queue length: 5. Wait queue head age: 5765.8ms.)
ActivityManager: Load: 16.72 / 11.58 / 8.91
ActivityManager: CPU usage from 0ms to 13484ms later:
ActivityManager: 186% 7321/com.autonavi.amapauto: 105% user + 80% kernel / faults: 2474 minor 3 major
ActivityManager: 49% 620/system_server: 22% user + 26% kernel / faults: 2787 minor
ActivityManager: 30% 1172/com.tencent.wecarspeech: 23% user + 7.2% kernel / faults: 3955 minor 2 major
ActivityManager: 11% 1536/com.android.bluetooth: 7.1% user + 4.2% kernel / faults: 2405 minor 2 major
ActivityManager: 4.7% 215/logd: 4.2% user + 0.4% kernel / faults: 13 minor
ActivityManager: 4% 239/debuggerd: 0.6% user + 3.4% kernel / faults: 4584 minor 1 major
ActivityManager: 6.8% 1296/com.android.phone: 4.1% user + 2.6% kernel / faults: 1230 minor
ActivityManager: 6.4% 925/com.nforetek.bt: 2.6% user + 3.7% kernel / faults: 1097 minor 1 major
ActivityManager: 5.8% 7537/com.tencent.wecarnavi:wecarbase: 2.9% user + 2.8% kernel / faults: 1150 minor
ActivityManager: 5.7% 7509/com.tencent.wecarnavi: 3% user + 2.6% kernel / faults: 951 minor
ActivityManager: 4.6% 866/sdcard: 0% user + 4.5% kernel / faults: 1 minor
ActivityManager: 4.3% 242/mediaserver: 3.2% user + 1.1% kernel
ActivityManager: 3.7% 858/com.android.launcher: 2.7% user + 0.9% kernel / faults: 741 minor 1 major
ActivityManager: 3.3% 792/gaei.cluster.service: 1.9% user + 1.4% kernel / faults: 908 minor
ActivityManager: 2.8% 1125/com.tencent.wecarnews: 1.4% user + 1.4% kernel / faults: 477 minor
ActivityManager: 2.7% 881/com.android.systemui: 1.5% user + 1.1% kernel / faults: 963 minor 2 major
ActivityManager: 2.6% 8825/top: 0.8% user + 1.8% kernel
ActivityManager: 2.3% 798/com.gaei.bt: 0.9% user + 1.4% kernel / faults: 846 minor
ActivityManager: 2.1% 1101/com.tencent.wecarmusicp: 0.7% user + 1.4% kernel / faults: 42 minor
ActivityManager: 2.1% 7744/com.autonavi.amapauto:push: 1.1% user + 1% kernel / faults: 30916 minor
ActivityManager: 1.1% 1269/com.gaei.settings: 0.5% user + 0.5% kernel / faults: 955 minor
ActivityManager: 1.5% 110/mmcqd/3: 0% user + 1.5% kernel
ActivityManager: 0.8% 833/gaei.reverse: 0.4% user + 0.3% kernel / faults: 826 minor
ActivityManager: 0.7% 5949/logcat: 0.4% user + 0.3% kernel
ActivityManager: 1.2% 803/gaei.thirdparty.media.adapter: 0.7% user + 0.5% kernel / faults: 654 minor
ActivityManager: 0.6% 998/gaei.cluster: 0.3% user + 0.3% kernel / faults: 759 minor
ActivityManager: 0.6% 1279/com.gaei.gaeihvsmsettings: 0.3% user + 0.3% kernel / faults: 868 minor
ActivityManager: 1.1% 233/surfaceflinger: 0.5% user + 0.5% kernel
ActivityManager: 0.5% 838/gaei.lockscreen: 0.3% user + 0.2% kernel / faults: 750 minor
ActivityManager: 0.5% 964/gaei.ecallbcall: 0.2% user + 0.2% kernel / faults: 719 minor
ActivityManager: 0.5% 1256/com.thundersoft.update: 0.3% user + 0.1% kernel / faults: 745 minor
ActivityManager: 0.5% 1274/gaei.bluetooth: 0.2% user + 0.2% kernel / faults: 732 minor
ActivityManager: 0.5% 1285/com.gaei.vehichesetting: 0.3% user + 0.1% kernel / faults: 786 minor
ActivityManager: 0.5% 8916/kworker/0:2: 0% user + 0.5% kernel
ActivityManager: 0.5% 2439/cn.gaei.appstore: 0.4% user + 0% kernel / faults: 60 minor
ActivityManager: 0.3% 232/servicemanager: 0.1% user + 0.2% kernel
ActivityManager: 0.2% 252/rild: 0% user + 0.2% kernel
ActivityManager: 0.2% 2131/com.thundersoft.connectivity: 0.2% user + 0% kernel / faults: 133 minor
ActivityManager: 0.2% 2153/com.excelfore.hmiagent: 0% user + 0.2% kernel / faults: 1779 minor
ActivityManager: 0.1% 7383/com.autonavi.amapauto:locationservice: 0.1% user + 0% kernel / faults: 266 minor
ActivityManager: 0.2% 7/rcu_preempt: 0% user + 0.2% kernel
ActivityManager: 0% 7360/com.autonavi.amapauto:adiu: 0% user + 0% kernel / faults: 223 minor
ActivityManager: 0.1% 8892/kworker/u8:0: 0% user + 0.1% kernel
ActivityManager: 0% 1//init: 0% user + 0% kernel
ActivityManager: 0% 3/ksoftirqd/0: 0% user + 0% kernel
ActivityManager: 0% 107/debounce_task: 0% user + 0% kernel
ActivityManager: 0% 108/irq/43-mm-irq-t: 0% user + 0% kernel
ActivityManager: 0% 210/jbd2/mmcblk3p4-: 0% user + 0% kernel
ActivityManager: 0% 237/netd: 0% user + 0% kernel / faults: 15 minor
ActivityManager: 0% 244/smcd: 0% user + 0% kernel
ActivityManager: 0% 248/system_setting_service: 0% user + 0% kernel
ActivityManager: 0% 1087/com.trumpchi.assistant.app: 0% user + 0% kernel / faults: 2 minor
ActivityManager: 0% 2009/com.thunderst.update: 0% user + 0% kernel / faults: 17 minor
ActivityManager: 93% TOTAL: 51% user + 41% kernel + 0.1% iowait + 0.3% softirq
ActivityManager: CPU usage from 11778ms to 12377ms later:
ActivityManager: 46% 620/system_server: 23% user + 23% kernel
ActivityManager: rename tracefile/data/anr/traces.txt to /data/anr/traces_com.autonavi.amapauto_20190906_110548.txt

在CPU异常偏高的情况下,系统记录的ANR Trace文件里面不会有ANR应用的调用逻辑堆栈信息,比如只有下面的“main”信息。

"main" prio=5 tid=1 Native
 | group="main" sCount=1 dsCount=0 obj=0x74030258 self=0xb4df6500
 | sysTid=1336 nice=0 cgrp=default sched=0/0 handle=0xb6f90b34
 | state=S schedstat=( 14858997090 10346952287 41860 ) utm=903 stm=582 core=1 HZ=100
 | stack=0xbe7b2000-0xbe7b4000 stackSize=8MB
 | held mutexes=
 kernel: (couldn't read /proc/self/task/1336/stack)
 native: #00 pc 00040d04 /system/lib/libc.so (__epoll_pwait+20)
 native: #01 pc 0001a17f /system/lib/libc.so (epoll_pwait+26)
 native: #02 pc 0001a18d /system/lib/libc.so (epoll_wait+6)
 native: #03 pc 00012cfb /system/lib/libutils.so (android::Looper::pollInner(int)+102)
 native: #04 pc 00012f77 /system/lib/libutils.so (android::Looper::pollOnce(int, int*, int*, void**)+130)
 native: #05 pc 000807ad /system/lib/libandroid_runtime.so (android::NativeMessageQueue::pollOnce(_JNIEnv*, _jobject*, int)+22)
 native: #06 pc 00008ebd /data/dalvik-cache/arm/system@framework@boot.oat (Java_android_os_MessageQueue_nativePollOnce__JI+96)
 at android.os.MessageQueue.nativePollOnce(Native method)
 at android.os.MessageQueue.next(MessageQueue.java:323)
 at android.os.Looper.loop(Looper.java:135)
 at android.app.ActivityThread.main(ActivityThread.java:5487)
 at java.lang.reflect.Method.invoke!(Native method)
 at com.android.internal.os.ZygoteInit$MethodAndArgsCaller.run(ZygoteInit.java:726)
 at com.android.internal.os.ZygoteInit.main(ZygoteInit.java:616)

这个时候应该根据ANR的时间点分析应用的日志文件,往前10s看应用调用逻辑信息,分析下哪部分逻辑模块的日志输出过于频繁。但有一个例外情况,如果问题出在应用日志输出写文件过于频繁,则需要精简下日志,去除不必要的日志信息。过于频繁的日志输出占用IO带宽,竞争CPU资源,特别是在设备IO带宽偏低的情况下,很容易影响到。

如果是耗时JNI接口频繁调用导致的ANR,可根据场景规避减少JNI接口的调用,遇到过性能差的设备即使是在JNI层调用setInt赋值一个参数都会耗时50ms的情况。

3. 卡在IO读写

一般是文件操作导致,比如下面的日志信息:

ANRManager: 100% TOTAL: 2% user + 2.1% kernel + 95% iowait + 0.1% softirq

iowait占比95%,分析ANR Trace文件,通过应用包名过滤调用堆栈信息,或者在ANR时间点往前看10s应用日志,看看当时做什么文件操作,一般也是用异步的方式来解决这个问题。

4. 死锁或锁等待

对于这种问题,一般会尝试将锁改为超时锁,比如lock的trylock,超时会自动释放锁,避免一直持有锁的情况发生。

5. 主线程Binder调用等待超时

主线程执行了Binder请求,对端迟迟未返回很容易出现这个问题,一般使用异步的方法解决。

6. Binder线程池被占满

系统对每个进程最多分配15个Binder线程,如果另一个进程发送太多重复Binder请求,那么就会导致接收端Binder线程被占满,从而处理不了其它的Binder请求。

判断Binder是否用完,可以在trace中搜索关键字”binder_f”,如果搜索到则表示已经用完,接着分析日志看是谁一直在消耗Binder或者是有死锁发生。

解决的方法就是降低极短时间内大量Binder请求,比如在发送BInder请求的函数中做时间差过滤,限定在500ms内最多执行一次。

7. JE或者NE导致ANR

ANR前出现频繁NE,NE所在的进程与ANR的进程有交互,在解决了NE后,ANR也不复存在。

对于这类在ANR前有JE或者NE,先解决JE或NE,JE/NE发生时会去dump一大堆异常信息,本身也会加重CPU负载,修改完异常后再来看ANR是否还存在。如果还存在,那么就看Trace 堆栈。如果不存在,则可以基本判定是JE或NE导致。

五、参考资料

ANR 问题一般解决思路

Android ANR监测方案解析

理解Android ANR的触发原理

BlockCanary — 轻松找出Android App界面卡顿元凶

AndroidPerformanceMonitor

ANR-WatchDog

Android WatchDog

SafeLooper

FileObserver

BlockCanary.java

LooperMonitor.java

Looper.loop

InputReader.cpp

InputDispatcher.cpp

 

 

扩展阅读:

Android 子线程更新UI详解

Android Home键之后后台启动Activity延迟5秒

Toast 自定义布局重复添加异常分析

 

 

转载请注明出处:陈文管的博客 – Android ANR详解

 

扫码或搜索:文呓

博客公众号

微信公众号 扫一扫关注

博客公众号
博客公众号

GitHub

https://github.com/wenguan0927

近期文章

  • Android平台动画类型详解
  • Kotlin null 详解
  • Android 残影数字动画实现详解
  • Android 卡片旋转切换动效实现详解
  • Android 心率动画自定义控件实现

友情链接

崔庆才的个人博客

Trinea  (codeKK)

Piasy

Paincker

wanandroid

陈祖杰的BLOG

闽ICP备18001825号-1 · Copyright © 2023 · Powered by chenwenguan.com