(Sorry for the long post)
Yesterday I tried to dig into Audio Driven script in attempt to create something different for myself and I am bit overwhelmed.
There are 2 crucial functions which I need from the script:
1. audio_to_spectrogram
2. spectrogram_to_value
I am OK with the first one (well, more or less). Though I lost it with fft signal processing.
It says there: "# Derive frequency domain from a segment of 160 signals"
What does that mean? And why 160? Where that number comes from?
But suppose I do not need to know HOW
audio_spectrogram is built, because it is probably standard for ANY further value assignments.
What I really need, is to understand the audio_spectrogram array structure so I can create a simple routine:
Filter 2 frequency ranges - a "Hard beat" (probably between 20 and 200 Hz) and a "Smooth Wave" (around 16 - 20 kHz) (I do not need a mid-range at all)
And then translate those 2 ranges amplitudes separately to some scale (0-100 for instance).
In attempt to better understand the structure I dumped to the Console the following from the loop (sort of reverse engineering):
print(fft_abs)
Result is 106 groups of numbers (for 1.773 sec wav is about right number of iClone frames)
Here is a single sampled frame content:
--------------------------------------------
[0.11150956 0.25723062 0.08865806 0.14261947 0.35034164 0.95271861
0.77180765 1.18902709 4.77695489 3.19351255 2.43299415 1.42882515
2.89056719 3.23675144 0.46677106 0.76576112 0.50855 1.04769554
3.24744908 0.82136079 0.52342382 0.8870013 0.77720837 1.27814819
1.75291115 1.5230229 2.40762168 2.12467929 2.04437209 1.19796097
0.14390374 0.48898827 0.77696653 0.65896997 0.84872364 2.77168353
0.62182953 0.0840715 0.59363135 0.05420146 1.06631399 0.56239089
1.85929132 2.31630288 1.03320651 0.39651085 0.24630279 0.28349139
0.2120239 0.13484599 0.11559678 0.10468502 0.0954504 0.06083779
0.15067458 0.12987788 0.1691395 0.33406639 0.16874957 0.45563993
0.33092178 0.38230812 0.17556105 0.70744437 0.2467003 1.03477113
0.63736083 0.71204789 0.52903696 0.23563739 0.11254837 0.24727394
0.37085328 0.49383915 0.21563008 0.28507491 0.54850175 2.77919332
1.56778496 1.34671397 0.70773272 0.21522238 0.59130217 1.07373824
0.24710162 0.53101785 1.00014957 0.38657898 0.67077908 0.52113286
1.75167786 0.80068758 0.9005741 0.276603 1.7392084 1.82330498
0.77034732 1.06487673 0.38866614 0.26912587 0.31042891 1.77936532
0.64409742 0.52655199 0.49242897 1.08934718 0.50919374 3.1518447
1.11932983 1.08887643 0.15766056 0.22869077 0.29967342 1.77338149
0.56163879 1.39084035 1.80432576 1.46165834 0.83674582 0.27736511
0.63598491 1.47150863 0.5986844 0.19184856 0.4096564 0.37384439
0.63514997 0.18030894 0.40751641 0.12279016 0.21885917 0.24109634
0.68227292 0.16547404 0.6008949 0.29243988 0.15307243 0.35350271
0.73112761 0.08970516 0.21179237 0.13587826 0.10938051 0.13771398
0.58163572 0.34767222 0.24000406 0.16353447 0.18655791 0.45121205
0.13052829 0.12671041 0.31687109 0.57070711 0.15643487 0.44449939
0.24323215 0.28844307 0.11622413 0.04674734 0.22112062 0.30112983
0.23626635 0.09739269 0.22203083 0.10051502 0.19820335 0.1683452
0.33237137 0.33166428 0.08851533 0.43111514 0.20603584 0.0973033
0.09174958 0.08245126 0.02493859 0.29208872 0.25614368 0.15274507
0.1494195 0.02537298 0.03503722 0.04417663 0.01018429 0.02329054
0.08388038 0.24216378 0.07371142 0.21777574 0.08832298 0.04154081
0.10838715 0.17016504 0.08851597 0.08117208 0.18640594 0.19730835
0.22404092 0.13889349 0.36618105 0.2982956 0.27594938 0.14094833
0.25682853 0.10210291 0.06375692 0.08733063 0.01776444 0.0504968
0.14075256 0.02556287 0.25705788 0.01771428 0.01066642 0.03975817
0.09502241 0.22844293 0.08399003 0.02156999 0.03714909 0.04872499
0.23168574 0.17439931 0.12584799 0.00548699 0.02107726 0.01491974
0.08920536 0.11376578 0.06185278 0.04880594 0.0498625 0.01610776
0.04961902 0.03622142 0.03234377 0.02982717 0.08341837 0.06429129
0.04581403 0.06817268 0.03275356 0.02889231 0.04015762 0.03917378
0.07482373 0.07681092 0.04184848 0.00596598 0.05676727 0.09020718
0.06368601 0.21919096 0.0733701 0.04725462 0.06236765 0.04885768
0.0373921 0.01421165 0.04765336 0.03694735 0.03303088 0.03086083
0.030988 0.02921685 0.03095408 0.03055253 0.02982576 0.02991376
0.02973645 0.02922301 0.02922083 0.02879407 0.029383 0.02931702
0.02803401 0.02849017 0.02858922 0.0292728 0.02858342 0.02882736
0.02896087 0.02853854 0.02829262 0.02868207 0.02861265 0.02874227
0.02907272 0.02851239 0.02835951 0.02847763 0.02803881 0.02801584
0.02831999 0.02839959 0.02799038 0.0285909 0.02875205 0.02800037
0.02794852 0.02811474 0.02756481 0.02795735 0.02820274 0.02811454
0.02817428 0.02792826 0.0276992 0.0283763 0.02781939 0.02810712
0.02851885 0.02817769 0.02776258 0.02811209 0.02806366 0.02765051
0.02751835 0.02729862 0.02701069 0.02749094 0.02812452 0.02819064
0.02712007 0.02700188 0.02714349 0.02754664 0.02724389 0.0274406
0.02806957 0.0276 0.02738082 0.02737823 0.02726042 0.0268806
0.02707177 0.02732006 0.02734533 0.02731668 0.02724605 0.02762236
0.02808626 0.02649266 0.02660802 0.02743208 0.02786796 0.02820454
0.02777062 0.02713335 0.0272746 0.02720932 0.02732232 0.02755141
0.02735785 0.0268071 0.02775384 0.02786173 0.02847259 0.02768545
0.02680469 0.02740702 0.02695009 0.02701072 0.02783906 0.02660854]
--------------------------------------------
There are 366 numbers in a single frame group. Each number represents an amplitude. Right?
So the question is, how frequencies range is distributed across the group? What is the total range?
Does it cover standard 20 Hz - 20 kHz (given high quality audio clip)?
Feel free to elaborate. Thanks.
Edited
5 Years Ago by
4u2ges