FL (Federated Learning, 연합 학습)
다수의 *로컬 클라이언트와 하나의 중앙 서버가 협력하여 데이터가 탈중앙화된 상태에서 글로벌 모델을 학습하는 기술
*로컬 클라이언트: 스마트폰, IoT 기기 등
장점 1. 데이터의 프라이버시 향상
병원의 임상 데이터와 같은 환자 개인정보가 보호되어야 하는 상황에서 데이터 유출 없이 모델 학습이 가능
장점 2. 커뮤니케이션 효율성
모든 로컬 디바이스의 데이터를 중앙 서버로 전송하게 되면 네트워크 트래픽과 스토리지 비용이 증가하지만
FL을 사용하면 로컬 모델의 업데이트 정보만 주고받으면 되기 때문에 커뮤니케이션 비용이 상당히 줄어든다
Cross-silo FL
"silo"란 큰 규모의 저장소를 의미
즉, Cross-silo FL은 각 디바이스의 크기가 비교적 큰 경우에 적용하며 보통 학습에 참여하는 클라이언트 수가 비교적 적음
(정확한 정의는 없으나 대체적으로 100개 ~ 10000개 까지를 silo로 보는 경향이 있음)
명확하게 어떠한 클라이언트가 학습에 참여하는지 확인 가능 (addressibility)
특수한 사건이 발생하지 않는 이상 각 디바이스가 정상 작동한다고 간주
예시: 병원 임상, COVID-19 관련 데이터 분석에 FL을 적용한 사례
Cross-device FL
Cross-silo FL 보다 훨씬 많은 수의 비교적 작은 크기의 디바이스가 참여하는 경우
불특정 다수가 학습에 참여하기 때문에 고려해야 할 사항이 Cross-silo 보다 더 많음
따라서 악의적인 의도를 가지고 접근하여 잘못된 data로 모델을 학습시키고자 하는 클라이언트나,
배터리 부족, 저장공간 부족, 인터넷 연결 상태 불안정 등의 이유로 정상적으로 학습에 참여하지 못하는 클라이언트가 얼마든지 존재할 수 있음
이러한 디바이스를 "Byzantine" 하다고 표현하며 Cross-device FL을 적용할 때 Byzantine client가 존재할 때에도 에러 없이 모델 업데이트가 진행될 수 있는 알고리즘을 고안해야 함 (Byzantine Fault Tolerance 알고리즘)
예시: 구글의 GBoard
Towards the Practical Utility of Federated Learning in the Medical Domain
FL is distributed machine learning framework in which each client does not share its data to preserve data privacy but instead, share model parameters.
One of the most suitable areas for adapting FL is the medical domain
→ patient privacy must be respected
The main challenge of FL in medical doamin is a *non-i.i.d. problem
*i.i.d : independent and identically distributed
different specific protocols, medical devices, and local demographics causes non-i.i.d. problem
Used three representative medical datasets:
1. longitudinal electronic health records (EHR)
2. skin cancer images
3. electrocardiogram signals (ECG)
Evaluate
- 6 FL algorithms designed for addressing data heterogeneity among clients
- 1 hybrid algorithm (FedPxN) combining the strengths of two representative FL algorithm
→ simple FL algorithms tend to outperform more sophisticated ones, the hybrid algorithm shows good performance
→ frequent global model update leads to better performance unde a fixed training iteration budget
FL Algorithms
FedAvg is a widely known framework in FL but does not ensure training convergence when datda are heterogeneous over local clients. Threrfore a large number of methods focus on addressing the non-i.i.d. problem.
FedAvg
- the de facto standard algorithm in FL
- first, the central server sends the global model to clients in each communication round
- then, each client sends the updated model back to the server after local training
- next, the central server averages all clients models considering the data size ratio
- FedAvg reduces the number of communication rounds required for model convergence by updating the model after multiple epochs of local training
FedProx
- adds a proximal term in the local objective of the FedAvg framework to solve the heterogeneity problem
- to solve non-i.i.d. problem, introduces L2 regularization term to the local objective function of the FedAvg
- the local model updates are restricted by the regularization term so that they are closer to the global model
FedOpt
- applies adaptive optimization in global aggregation to stabilize convegence for heterogeneous data
FedDyn
- uses a dynamic regularizer to solve the inconsistency of local and global objectives due to the data heterogeneity of each local client whearas Scaffold computes and aggregates control variates
FedBN
- Most FL methods are validated in a label-heterogenous experimental setting by partitioning the same sourced dataset into multiple clients. In contrast, FedBN proposed to aggregate local models without batch normlization layers to handle the non-i.d.d. problem that can occur due to feature shifts in different data sources
- similarly, SiloBN aggregates local models without local batch normalization statistics
- batch normalization, group normalization and layer normalization are tested for each task to maximize the peformance of the model
FedDAR
- decouples domain-specific prediction heads and a shared encoder in order to tackle a non-i.d.d. setting where there is a similarity between each domain
The statistics used in the normalization layers are different in each local client, which could be challenging for the aggregated model to capture the distribution of data collected from different sources.
As in FedBN, a simple solution is to aggregate the local client model without the normalization layers
The normalization in FedBN naturally move towards their own local optimum depending on the data distribution of each client.
However, the other non-normalization layers might also move towards the local optimum in consistent with the global optimum becuase of the effect of the normalization layers in the model of each client.
As can be seen in the image(a), each clien'ts model has evolved in different diretions from the global model.
Hypothesized that a proximal term inspired by FedProx will help the local models in FedBN to progress towards the global optimum, as image(b) shows.
Hybrid FL Algorithm: FedPxN
a proposed algorithm, encouraging the normalization layers to adapt to each client's unique feature distribution and the other layers to follow the global optimum in order to improve the overall performance
→ local models are updated by the local objective with the proximal term of the other layers of the global model during local training and then aggregated wihtout the normlization layers
Number of local training epochs
Performance of most FL methods consistently decreases as the number of local epochs increases.
→ FL methods peform well with fewer local epochs and more global updates
because the number of total training epochs is fixed, fewer local epochs indicate more communication rounds
'Deep Learning' 카테고리의 다른 글
[Paper] HTS-AT : A Hierarchical Token-Semantic Audio Transformer for Sound Classification and Detection (0) | 2023.07.17 |
---|