dfs.ls.limit主要用于查询目录下的子文件,子文件单次返回最大个数。默认值为1000。
NameNodeRpcServer.java
@Override // ClientProtocol
public DirectoryListing getListing(String src, byte[] startAfter,
boolean needLocation) throws IOException {
checkNNStartup();
DirectoryListing files = namesystem.getListing(
src, startAfter, needLocation);
if (files != null) {
metrics.incrGetListingOps();
metrics.incrFilesInGetListingOps(files.getPartialListing().length);
}
return files;
}
FSDirStatAndListingOp.java
private static DirectoryListing getListing(FSDirectory fsd, INodesInPath iip,
byte[] startAfter, boolean needLocation, boolean includeStoragePolicy)
throws IOException {
if (FSDirectory.isExactReservedName(iip.getPathComponents())) {
return getReservedListing(fsd);
}
fsd.readLock();
try {
if (iip.isDotSnapshotDir()) {
return getSnapshotsListing(fsd, iip, startAfter);
}
final int snapshot = iip.getPathSnapshotId();
final INode targetNode = iip.getLastINode();
if (targetNode == null) {
return null;
}
byte parentStoragePolicy = includeStoragePolicy
? targetNode.getStoragePolicyID()
: HdfsConstants.BLOCK_STORAGE_POLICY_ID_UNSPECIFIED;
if (!targetNode.isDirectory()) {
// return the file's status. note that the iip already includes the
// target INode
return new DirectoryListing(
new HdfsFileStatus[]{ createFileStatus(
fsd, iip, null, parentStoragePolicy, needLocation, false)
}, 0);
}
final INodeDirectory dirInode = targetNode.asDirectory();
final ReadOnlyList<INode> contents = dirInode.getChildrenList(snapshot);
int startChild = INodeDirectory.nextChild(contents, startAfter);
int totalNumChildren = contents.size();
int numOfListing = Math.min(totalNumChildren - startChild,
fsd.getLsLimit());
int locationBudget = fsd.getLsLimit();
int listingCnt = 0;
HdfsFileStatus listing[] = new HdfsFileStatus[numOfListing];
for (int i = 0; i < numOfListing && locationBudget > 0; i++) {
INode child = contents.get(startChild+i);
byte childStoragePolicy = (includeStoragePolicy && !child.isSymlink())
? getStoragePolicyID(child.getLocalStoragePolicyID(),
parentStoragePolicy)
: parentStoragePolicy;
listing[i] = createFileStatus(fsd, iip, child, childStoragePolicy,
needLocation, false);
listingCnt++;
if (listing[i] instanceof HdfsLocatedFileStatus) {
// Once we hit lsLimit locations, stop.
// This helps to prevent excessively large response payloads.
// Approximate #locations with locatedBlockCount() * repl_factor
LocatedBlocks blks =
((HdfsLocatedFileStatus)listing[i]).getLocatedBlocks();
locationBudget -= (blks == null) ? 0 :
blks.locatedBlockCount() * listing[i].getReplication();
}
}
// truncate return array if necessary
if (listingCnt < numOfListing) {
listing = Arrays.copyOf(listing, listingCnt);
}
return new DirectoryListing(
listing, totalNumChildren-startChild-listingCnt);
} finally {
fsd.readUnlock();
}
}
可以看到 int locationBudget = fsd.getLsLimit();这个返回子文件的个数。
FSDirectory.java
int configuredLimit = conf.getInt(
DFSConfigKeys.DFS_LIST_LIMIT, DFSConfigKeys.DFS_LIST_LIMIT_DEFAULT);
this.lsLimit = configuredLimit>0 ?
configuredLimit : DFSConfigKeys.DFS_LIST_LIMIT_DEFAULT;
可以看到就是dfs.ls.limit,默认值1000,如果配置值不大于0,自动也会配置成默认值。
那下次查询从哪里开始呢?我们可以看到getListing接口还有一个入参startAfter,代表从这个位置开始查询。startAfter其实本质上就是上次查询最后一个子文件的名字。这里面有个问题,是如何查到子文件startAfter的位置的呢?hdfs使用的二分查询。
INodeDirectory.java
static int nextChild(ReadOnlyList<INode> children, byte[] name) {
if (name.length == 0) { // empty name
return 0;
}
int nextPos = ReadOnlyList.Util.binarySearch(children, name) + 1;
if (nextPos >= 0) {
return nextPos;
}
return -nextPos;
}
ReadOnlyList.Util.java
public static <K, E extends Comparable<K>> int binarySearch(
final ReadOnlyList<E> list, final K key) {
int lower = 0;
for(int upper = list.size() - 1; lower <= upper; ) {
final int mid = (upper + lower) >>> 1;
final int d = list.get(mid).compareTo(key);
if (d == 0) {
return mid;
} else if (d > 0) {
upper = mid - 1;
} else {
lower = mid + 1;
}
}
return -(lower + 1);
}
注意一个细节,如果查询到,返回为child的位置,如果查询不到,从0开始重新查询。
从整个剖析来看,当目录子文件过多时,ls的性能其实不是很好。而且还有如果正好最后一个子文件删除了,返回会出现重复。那为何不使用inode来作为查询条件呢?笔者认为可能是历史问题。建议dfs.ls.limit可以设置大一点。