HBase跨集群复制Snapshot失败原因分析及解决

### 起因
跨集群复制HBase快照时，经常会出现由于/hbase/.tmp/data/xxx FileNotFoundException导致任务失败
现还原出错场景，并分析错误原因，给出一些常用的解决方法
```
Caused by: org.apache.hadoop.ipc.RemoteException(java.io.FileNotFoundException): File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
        at org.apache.hadoop.hdfs.web.JsonUtil.toRemoteException(JsonUtil.java:119)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.validateResponse(WebHdfsFileSystem.java:419)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.access$200(WebHdfsFileSystem.java:107)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.connect(WebHdfsFileSystem.java:595)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$ReadRunner.connect(WebHdfsFileSystem.java:1855)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem$AbstractRunner.runWithRetry(WebHdfsFileSystem.java:673)
        ... 23 more

18/08/13 20:14:14 INFO mapreduce.Job:  map 100% reduce 0%
18/08/13 20:14:14 INFO mapreduce.Job: Job job_1533546266978_0038 failed with state FAILED due to: Task failed task_1533546266978_0038_m_000000
```
* 主要原因
在创建快照到跨集群复制过程中，部分StoreFile的位置发生了移动，以至不能正常寻址（ 使用webhdfs的bug）

### 场景还原
#### 准备工作
* 环境：
源集群：HBase 1.2.0-cdh5.10.0
目标集群：HBase 1.2.0-cdh5.12.1

##### 1. 创建表mytable，2个region，以03为分割，一个列族info
put 6条数据
```
put 'mytable','01','info:age','1'`
put 'mytable','02','info:age','2'`
put 'mytable','03','info:age','3'
put 'mytable','04','info:age','1'
put 'mytable','05','info:age','1'
put 'mytable','06','info:age','1'
```

##### 2. 创建快照mysnapshot，生成以下文件
```
[root@test108 ~]# hdfs dfs -ls /datafs/.hbase-snapshot/mysnapshot/
Found 2 items
-rw-r--r--   2 hbase hbase         32 2018-08-13 18:48 /datafs/.hbase-snapshot/mysnapshot/.snapshotinfo
-rw-r--r--   2 hbase hbase        466 2018-08-13 18:48 /datafs/.hbase-snapshot/mysnapshot/data.manifest
```
*  .snapshot 包含了快照信息，即HBaseProtos.SnapshotDescription对象
```
name: "mysnapshot"
table: "mytable"
creation_time: 1533774121010
type: FLUSH
version: 2
```

* data.manifest 
包含了hbase表schema、attributes、column_families，即HBaseProtos.SnapshotDescription对象，重点的是store_files信息，
```
  region_info {
    region_id: 1533784567273
    table_name {
      namespace: "default"
      qualifier: "mytable"
    }
    start_key: "03"
    end_key: ""
    offline: false
    split: false
    replica_id: 0
  }
  family_files {
    family_name: "info"
    store_files {
      name: "3c5e9ec890f04560a396040fa8b592a3"
      file_size: 1115
    }
  }
```

##### 3. 修改数据
通过Put 修改一个Region的数据
```
put 'mytable','04','info:age','4'
put 'mytable','05','info:age','5'
put 'mytable','06','info:age','6'
```

##### 4. 进行flush，major_compat
**模拟跨集群复制过程中出现的大/小合并**
```
hbase(main):001:0> flush 'mytable'
0 row(s) in 0.8200 seconds

hbase(main):002:0> major_compact 'mytable'
0 row(s) in 0.1730 seconds
```

此时 storefile 3c5e9ec890f04560a396040fa8b592a3 出现在了archive下
```
[root@test108 ~]# hdfs dfs -ls -R /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667
drwxr-xr-x   - hbase hbase          0 2018-08-15 08:30 /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info
-rw-r--r--   2 hbase hbase       1115 2018-08-13 18:48 /datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
```

#### 还原出错
```
[root@a2502f06 ~]# hbase org.apache.hadoop.hbase.snapshot.ExportSnapshot    \
>  -Dipc.client.fallback-to-simple-auth-allowed=true  \
>  -Dmapreduce.job.queuename=root.default  \
>  -snapshot mysnapshot    \
>  -copy-from webhdfs://archive.cloudera.com/datafs     \
>  -copy-to webhdfs://nameservice1/hbase/     \
>  -chuser hbase -chgroup hbase -chmod 755 -overwrite
```

控制台提示，FileNotFound，任务失败
```
18/08/13 20:59:34 INFO mapreduce.Job: Task Id : attempt_1533546266978_0037_m_000000_0, Status : FAILED
Error: java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
        at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
        at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
        at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
        at org.apache.hadoop.ipc.RemoteException.instantiateException(RemoteException.java:106)
        at org.apache.hadoop.ipc.RemoteException.unwrapRemoteException(RemoteException.java:95)
        at org.apache.hadoop.hdfs.web.WebHdfsFileSystem.toIOException(WebHdfsFileSystem.java:450)
```

### 源代码剖析
1. ExportSnapshot执行复制前会先将.snapshot,data.manifest 复制到目标端 .hbase-snapshot/.tmp/mysnapshot下
```
[root@a2502f06 ~]# hdfs dfs -ls /hbase/.hbase-snapshot/.tmp/mysnapshot
Found 2 items
-rwxr-xr-x   2 hbase hbase         32 2018-08-13 20:28 /hbase/.hbase-snapshot/.tmp/mysnapshot/.snapshotinfo
-rwxr-xr-x   2 hbase hbase        466 2018-08-13 20:28 /hbase/.hbase-snapshot/.tmp/mysnapshot/data.manifest
```
2. 解析data.manifest，按照storefile进行逻辑切片，map每次会读入一个SnapshotFileInfo的信息，只包含了HFileLink信息，**并没有包括具体路径**
```
            String region = regionInfo.getEncodedName();
            String hfile = storeFile.getName();
            Path path = HFileLink.createPath(table, region, family, hfile); 
            SnapshotFileInfo fileInfo = SnapshotFileInfo.newBuilder()
                    .setType(SnapshotFileInfo.Type.HFILE)
                    .setHfile(path.toString())
                    .build();
```

3. map阶段
每读入一个SnapshotFileInfo时，拼接出关于StoreFile可能出现的4个路径，读取时按照该**顺序**查找
```
/datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/mobdir/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
/datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
```
当map读入数据时，调用ExportSnapshot.ExportMapper#openSourceFile  初始化InputStream的过程中
 通过调用FileLink.tryOpen()方法中，来确定StoreFile的真实路径路径（遍历以上4个路径）
```
    private FSDataInputStream tryOpen() throws IOException {
      for (Path path: fileLink.getLocations()) {
        if (path.equals(currentPath)) continue;
        try {
          in = fs.open(path, bufferSize);
          if (pos != 0) in.seek(pos);
          assert(in.getPos() == pos) : "Link unable to seek to the right position=" + pos;
          currentPath = path;
          return(in);
        } catch (FileNotFoundException e) {
          // Try another file location
        }
      }
      throw new FileNotFoundException("Unable to open link: " + fileLink);
    }
 ```
正常调用in.getPos()时，storefile不存在时抛出FileNotException，catch后，继续寻找下一路径
但在使用webhdfs时，fs为org.apache.hadoop.hdfs.web.WebHdfsFileSystem对象
遗憾的是，WebHdfsFileSystem打开的in调用getPos()时，不会抛出异常，因此，第一次获取到的路径如下（实际文件存在于archive）
`/datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3`
同时将该路径设置为currentPath（下一次会用到，避免重复判定）

当读取StoreFile时，调用FileLink.read()
```
    @Override
    public int read() throws IOException {
      int res;
      try {
        res = in.read();
      } catch (FileNotFoundException e) {
        res = tryOpen().read();
      } catch (NullPointerException e) { // HDFS 1.x - DFSInputStream.getBlockAt()
        res = tryOpen().read();
      } catch (AssertionError e) { // assert in HDFS 1.x - DFSInputStream.getBlockAt()
        res = tryOpen().read();
      }
      if (res > 0) pos += 1;
      return res;
    }
```
由于in初始化时，并没有使用正确的路径，因此 in.read()时，抛出FileNotFoundException（第一次）
catch后调用tryOpen().read()方法继续遍历4个路径，此时 currentPath为 data路径跳过，使用下一个路径（文件还在archive下）
`/datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 `

read .tmp路径时，抛出FileNotFoundException（第二次），此异常向上抛出，task失败
**日志中看到的由于.tmp下文件不存在报Error，实际跟.tmp并没多大关系，只是恰好被遍历到**
```
2018-08-13 20:13:59,738 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
2018-08-13 20:13:59,740 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
2018-08-13 20:13:59,741 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File does not exist: /datafs/mobdir/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
--------------------------------------------------------------------------------------------------------------------------------------------
2018-08-13 20:13:59,830 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File /datafs/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
2018-08-13 20:13:59,833 WARN [main] org.apache.hadoop.security.UserGroupInformation: PriviledgedActionException as:hbase (auth:SIMPLE) cause:java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
--------------------------------------------------------------------------------------------------------------------------------------------
2018-08-13 20:13:59,833 ERROR [main] org.apache.hadoop.hbase.snapshot.ExportSnapshot$ExportMapper: Error copying webhdfs://archive.cloudera.com/datafs/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 to webhdfs://nameservice1/hbase/archive/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3
java.io.FileNotFoundException: File /datafs/.tmp/data/default/mytable/c48642fecae3913e0d09ba236b014667/info/3c5e9ec890f04560a396040fa8b592a3 not found.
	at sun.reflect.GeneratedConstructorAccessor14.newInstance(Unknown Source)
```

分隔线之内的FileNotFoundException，为两次调用read()时抛出的异常
分隔线之上File does not exist 为ExportSnapshot 系调用 getSourceFileStatus产生，可以观察到在遍历 data/.tmp/mobdir 后寻找到了正确路径archive(未打印出)

### 解决思路
综上：使用webhdfs时，只会查找data、.tmp目录下的StoreFile，不会查找archive目录
因此解决上，一是避免StoreFile出现在archive下，二是能获到archive路径

#### 避免StoreFile出现在archive
根据生产经验，在数据大量写入过程中，Region下不断生成StoreFile，当StoreFile数量达到阈值时，触发大/小合并
被合并的StoreFile文件移动到了archive文件下，可通过以下几个方法避免复制时大/小合并
1. 对表进行major_compact后再建快照
2. 如果表可以接受一段时间的不可用，几分钟到几十分钟不等，可对表进行disable后再操作
3. 或者适当调大 hbase.hstore.compaction.Threadhold（表写入不频繁下）
4. 根据业务情况，尽可能大的错开数据写入与复制的间隔（等待大/小合并自动完成）

#### 避免使用webhdfs
使用hdfs时，可以正常抛出异常（未具体使用）

#### 修复源码bug
使得可以正常读到archive下storefile
借鉴getSourceFileStatus()，在for中加一行 fs.getFileStatus()，文件不存在时正常抛出FileNotFoundException
```
        private FSDataInputStream tryOpen() throws IOException {

            for (Path path : fileLink.getLocations()) {
                if (path.equals(currentPath)) continue;
                try {
                    fs.getFileStatus(path); // 添加此行，使正常抛出异常
                    in = fs.open(path, bufferSize);
                    if (pos != 0) in.seek(pos);
                    assert(in.getPos() == pos) : "Link unable to seek to the right position=" + pos;
                    if (LOG.isTraceEnabled()) {
                        if (currentPath == null) {
                            LOG.debug("link open path=" + path);
                        } else {
                            LOG.trace("link switch from path=" + currentPath + " to path=" + path);
                        }
                    }
                    currentPath = path;
                    return(in);
                } catch (FileNotFoundException e) {
                    // Try another file location
                }
            }
            throw new FileNotFoundException("Unable to open link: " + fileLink);
        }
```

将ExportSnapshot抽出，重新组织HFileLink,FileLink,WALLink依赖
打包成hadoop jar，避免影响其它功能

### 待完成
工作中，偶尔会碰到如下诡异情况，待进一步研究
![image](https://user-images.githubusercontent.com/26105924/44083748-70d26734-9fe7-11e8-81ae-b6cf90c46816.png)

参考：https://blog.csdn.net/t894690230/article/details/52121613


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

HBase跨集群复制Snapshot失败原因分析及解决 #1

起因

场景还原

准备工作

1. 创建表mytable，2个region，以03为分割，一个列族info

2. 创建快照mysnapshot，生成以下文件

3. 修改数据

4. 进行flush，major_compat

还原出错

源代码剖析

解决思路

避免StoreFile出现在archive

避免使用webhdfs

修复源码bug

待完成

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

HBase跨集群复制Snapshot失败原因分析及解决 #1

Description

起因

场景还原

准备工作

1. 创建表mytable，2个region，以03为分割，一个列族info

2. 创建快照mysnapshot，生成以下文件

3. 修改数据

4. 进行flush，major_compat

还原出错

源代码剖析

解决思路

避免StoreFile出现在archive

避免使用webhdfs

修复源码bug

待完成

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions