Let us assume 2x2 image, 3 channels (labels possibilities) and 2 batch data. Then the predicted scores (logits) is given by: Assume 2x2 array is arranged in a single row. For N = 1 (of batch 2): [ 9 2 5 8 ] Each row is a class [ 7 8 4 6 ] Each column is score of scores of every class for a pixel [ 8 4 6 7 ] For N = 2 (of batch 2): [ 7 3 0 3 ] [ 4 6 1 1 ] [ 2 4 4 2 ] If we convert it to softmax scores (Caffe's Softmax does the same) say a column is: [ x y z ] Then softmax score (which is calculated for every row/class/channel) is: [ e^x / ( e^x + e^y + e^z ) e^y / ( e^x + e^y + e^z ) e^z / ( e^x + e^y + e^z ) ] So softmax output will be: [ 0.6652409 0.00242826 0.24472848 0.6652409 ] [ 0.09003057 0.9796292 0.09003057 0.09003057 ]