Hi G. As I mentioned and written and said many times before, normalization is tricky business and I generally recommend avoiding worrying about it unless necessary. That said, of course I’ll answer your question 
The “correct” normalization depends on your goal. For Parsecal’s theorem, you divide by nfft. That post on math-stackexchange that you linked is actually doing the same thing. If you write out the math from the code, the response suggests to scale the time domain signal by 1/fs and the frequency domain signal by 1/(fs*nfft). You can simply pull out the 1/fs from both terms and only have the frequency domain divided by nfft. That solution is also correct because it follows from the calculus (dt, df), but the algebra and discretization on computers allows us to ignore that common factor.
This normalization, however, is “incorrect” at recovering the energy at the frequency in the signal, because the zero-padding distributes more energy around the spectrum. So to recover the units of the signal at a particular frequency, the normalization is dividing by the length of the signal, which is not the same thing as dividing by nfft if you are zero-padding.
So the conclusion here is that normalization is tricky
I’d be hesitant to say that one is right or wrong; different normalizations serve different visual/statistical/mathematical/practical goals and have different implications.
The scaling factor of 2 for amplitudes is to sum the positive and negative frequency sides of the spectrum together. For a real-valued signal, they mirror and so you can double the positive frequencies and ignore the negative frequencies. A complex-valued Morlet wavelet is single-sided (zero energy in the negative frequencies), so that one actually probably shouldn’t be doubled. If I did it in a video, it might have been a mistake, unless there was some specific reason why I did it.
Finally, some starter code to help you explore these concepts:
% parameters
signaldur = 2.4; % seconds
signalAmp = .4;
nfft = 29383; % fft zero-padding
srate = 1000;
% create the signal
time = (0:signaldur*srate-1) / srate;
signal = signalAmp*sin(2*pi*10*time);
% Parseval's theorem
energyTime = sum(signal.^2);
energyFreq1 = sum(abs(fft(signal,nfft)).^2)/nfft;
energyFreq2 = sum(abs(fft(signal,nfft)).^2)/srate;
[energyTime energyFreq1 energyFreq2]
% amplitude spectrum (peak should match signalAmp)
plot(linspace(0,srate,nfft),2*abs(fft(signal,nfft))/length(signal))
set(gca,'xlim',[8 12])