Skip to content

HBase跨集群复制Snapshot失败原因分析及解决 #1

Description

@cfbber

起因

跨集群复制HBase快照时,经常会出现由于/hbase/.tmp/data/xxx FileNotFoundException导致任务失败
现还原出错场景,并分析错误原因,给出一些常用的解决方法

Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
        at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:119)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:419)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:107)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:595)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.connect(WebHdfsFileSystem.java:1855)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:673)
        ... 23 more

18/08/13 20:14:14 INFO mapreduce.Job:  map 100% reduce 0%
18/08/13 20:14:14 INFO mapreduce.Job: Job job_1533546266978_0038 failed with state FAILED due to: Task failed task_1533546266978_0038_m_000000
  • 主要原因
    在创建快照到跨集群复制过程中,部分StoreFile的位置发生了移动,以至不能正常寻址( 使用webhdfs的bug)

场景还原

准备工作

  • 环境:
    源集群:HBase 1.2.0-cdh5.10.0
    目标集群:HBase 1.2.0-cdh5.12.1
1. 创建表mytable,2个region,以03为分割,一个列族info

put 6条数据

put 'mytable','01','info:age','1'`
put 'mytable','02','info:age','2'`
put 'mytable','03','info:age','3'
put 'mytable','04','info:age','1'
put 'mytable','05','info:age','1'
put 'mytable','06','info:age','1'
2. 创建快照mysnapshot,生成以下文件
[root@test108 ~]# hdfs dfs -ls /datafs/.hbase-snapshot/mysnapshot/
Found 2 items
-rw-r--r--   2 hbase hbase         32 2018-08-13 18:48 /datafs/.hbase-snapshot/mysnapshot/.snapshotinfo
-rw-r--r--   2 hbase hbase        466 2018-08-13 18:48 /datafs/.hbase-snapshot/mysnapshot/data.manifest
  • .snapshot 包含了快照信息,即HBaseProtos.SnapshotDescription对象
name: "mysnapshot"
table: "mytable"
creation_time: 1533774121010
type: FLUSH
version: 2
  • data.manifest
    包含了hbase表schema、attributes、column_families,即HBaseProtos.SnapshotDescription对象,重点的是store_files信息,
  region_info {
    region_id: 1533784567273
    table_name {
      namespace: "default"
      qualifier: "mytable"
    }
    start_key: "03"
    end_key: ""
    offline: false
    split: false
    replica_id: 0
  }
  family_files {
    family_name: "info"
    store_files {
      name: "3c5e9ec890f04560a396040fa8b592a3"
      file_size: 1115
    }
  }
3. 修改数据

通过Put 修改一个Region的数据

put 'mytable','04','info:age','4'
put 'mytable','05','info:age','5'
put 'mytable','06','info:age','6'
4. 进行flush,major_compat

模拟跨集群复制过程中出现的大/小合并

hbase(main):001:0> flush 'mytable'
0 row(s) in 0.8200 seconds

hbase(main):002:0> major_compact 'mytable'
0 row(s) in 0.1730 seconds

此时 storefile 3c5e9ec890f04560a396040fa8b592a3 出现在了archive下

[root@test108 ~]# hdfs dfs -ls -R /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667
drwxr-xr-x   - hbase hbase          0 2018-08-15 08:30 /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info
-rw-r--r--   2 hbase hbase       1115 2018-08-13 18:48 /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

还原出错

[root@a2502f06 ~]# hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot    \
>  -Dipc.client.fallback-to-simple-auth-allowed=true  \
>  -Dmapreduce.job.queuename=root.default  \
>  -snapshot mysnapshot    \
>  -copy-from webhdfs://archive.cloudera.com/datafs     \
>  -copy-to webhdfs://nameservice1/hbase/     \
>  -chuser hbase -chgroup hbase -chmod 755 -overwrite

控制台提示,FileNotFound,任务失败

18/08/13 20:59:34 INFO mapreduce.Job: Task Id : attempt_1533546266978_0037_m_000000_0, Status : FAILED
Error: java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
        at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:450)

源代码剖析

  1. ExportSnapshot执行复制前会先将.snapshot,data.manifest 复制到目标端 .hbase-snapshot/.tmp/mysnapshot下
[root@a2502f06 ~]# hdfs dfs -ls /hbase/.hbase-snapshot/.tmp/mysnapshot
Found 2 items
-rwxr-xr-x   2 hbase hbase         32 2018-08-13 20:28 /hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
-rwxr-xr-x   2 hbase hbase        466 2018-08-13 20:28 /hbase/.hbase-snapshot/.tmp/mysnapshot/data.manifest
  1. 解析data.manifest,按照storefile进行逻辑切片,map每次会读入一个SnapshotFileInfo的信息,只包含了HFileLink信息,并没有包括具体路径
            String region = regionInfo.getEncodedName();
            String hfile = storeFile.getName();
            Path path = HFileLink.createPath(table, region, family, hfile); 
            SnapshotFileInfo fileInfo = SnapshotFileInfo.newBuilder()
                    .setType(SnapshotFileInfo.Type.HFILE)
                    .setHfile(path.toString())
                    .build();
  1. map阶段
    每读入一个SnapshotFileInfo时,拼接出关于StoreFile可能出现的4个路径,读取时按照该顺序查找
/datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/mobdir/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

当map读入数据时,调用ExportSnapshot.ExportMapper#openSourceFile 初始化InputStream的过程中
通过调用FileLink.tryOpen()方法中,来确定StoreFile的真实路径路径(遍历以上4个路径)

    private FSDataInputStream tryOpen() throws IOException {
      for (Path path: fileLink.getLocations()) {
        if (path.equals(currentPath)) continue;
        try {
          in = fs.open(path, bufferSize);
          if (pos != 0) in.seek(pos);
          assert(in.getPos() == pos) : "Link unable to seek to the right position=" + pos;
          currentPath = path;
          return(in);
        } catch (FileNotFoundException e) {
          // Try another file location
        }
      }
      throw new FileNotFoundException("Unable to open link: " + fileLink);
    }

正常调用in.getPos()时,storefile不存在时抛出FileNotException,catch后,继续寻找下一路径
但在使用webhdfs时,fs为org.apache.hadoop.hdfs.web.WebHdfsFileSystem对象
遗憾的是,WebHdfsFileSystem打开的in调用getPos()时,不会抛出异常,因此,第一次获取到的路径如下(实际文件存在于archive)
/datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
同时将该路径设置为currentPath(下一次会用到,避免重复判定)

当读取StoreFile时,调用FileLink.read()

    @Override
    public int read() throws IOException {
      int res;
      try {
        res = in.read();
      } catch (FileNotFoundException e) {
        res = tryOpen().read();
      } catch (NullPointerException e) { // HDFS 1.x - DFSInputStream.getBlockAt()
        res = tryOpen().read();
      } catch (AssertionError e) { // assert in HDFS 1.x - DFSInputStream.getBlockAt()
        res = tryOpen().read();
      }
      if (res > 0) pos += 1;
      return res;
    }

由于in初始化时,并没有使用正确的路径,因此 in.read()时,抛出FileNotFoundException(第一次)
catch后调用tryOpen().read()方法继续遍历4个路径,此时 currentPath为 data路径跳过,使用下一个路径(文件还在archive下)
/datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3

read .tmp路径时,抛出FileNotFoundException(第二次),此异常向上抛出,task失败
日志中看到的由于.tmp下文件不存在报Error,实际跟.tmp并没多大关系,只是恰好被遍历到

2018-08-13 20:13:59,738 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
2018-08-13 20:13:59,740 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
2018-08-13 20:13:59,741 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/mobdir/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
--------------------------------------------------------------------------------------------------------------------------------------------
2018-08-13 20:13:59,830 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File /datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
2018-08-13 20:13:59,833 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
--------------------------------------------------------------------------------------------------------------------------------------------
2018-08-13 20:13:59,833 ERROR [main] org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper: Error copying webhdfs://archive.cloudera.com/datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 to webhdfs://nameservice1/hbase/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
	at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)

分隔线之内的FileNotFoundException,为两次调用read()时抛出的异常
分隔线之上File does not exist 为ExportSnapshot 系调用 getSourceFileStatus产生,可以观察到在遍历 data/.tmp/mobdir 后寻找到了正确路径archive(未打印出)

解决思路

综上:使用webhdfs时,只会查找data、.tmp目录下的StoreFile,不会查找archive目录
因此解决上,一是避免StoreFile出现在archive下,二是能获到archive路径

避免StoreFile出现在archive

根据生产经验,在数据大量写入过程中,Region下不断生成StoreFile,当StoreFile数量达到阈值时,触发大/小合并
被合并的StoreFile文件移动到了archive文件下,可通过以下几个方法避免复制时大/小合并

  1. 对表进行major_compact后再建快照
  2. 如果表可以接受一段时间的不可用,几分钟到几十分钟不等,可对表进行disable后再操作
  3. 或者适当调大 hbase.hstore.compaction.Threadhold(表写入不频繁下)
  4. 根据业务情况,尽可能大的错开数据写入与复制的间隔(等待大/小合并自动完成)

避免使用webhdfs

使用hdfs时,可以正常抛出异常(未具体使用)

修复源码bug

使得可以正常读到archive下storefile
借鉴getSourceFileStatus(),在for中加一行 fs.getFileStatus(),文件不存在时正常抛出FileNotFoundException

        private FSDataInputStream tryOpen() throws IOException {

            for (Path path : fileLink.getLocations()) {
                if (path.equals(currentPath)) continue;
                try {
                    fs.getFileStatus(path); // 添加此行,使正常抛出异常
                    in = fs.open(path, bufferSize);
                    if (pos != 0) in.seek(pos);
                    assert(in.getPos() == pos) : "Link unable to seek to the right position=" + pos;
                    if (LOG.isTraceEnabled()) {
                        if (currentPath == null) {
                            LOG.debug("link open path=" + path);
                        } else {
                            LOG.trace("link switch from path=" + currentPath + " to path=" + path);
                        }
                    }
                    currentPath = path;
                    return(in);
                } catch (FileNotFoundException e) {
                    // Try another file location
                }
            }
            throw new FileNotFoundException("Unable to open link: " + fileLink);
        }

将ExportSnapshot抽出,重新组织HFileLink,FileLink,WALLink依赖
打包成hadoop jar,避免影响其它功能

待完成

工作中,偶尔会碰到如下诡异情况,待进一步研究
image

参考:https://blog.csdn.net/t894690230/article/details/52121613

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions